[ https://issues.apache.org/jira/browse/HIVE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041510#comment-14041510 ]
Gilad Wolff commented on HIVE-6893: ----------------------------------- I encountered the same issue, we get a socket read timeout and then out-of-sequence error. In one case we got an OOM in our client and I suspect it's the same underlying issue. Here is the metastore sequence of events. Our client tried to drop a table starting at 14:02:25. Note that we use a 20 seconds timeout for our client: {code} 2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 11: source:/10.20.93.47 drop_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue ip=/10.20.93.47 cmd=source:/10.20.93.47 drop_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 11: source:/10.20.93.47 get_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue ip=/10.20.93.47 cmd=source:/10.20.93.47 get_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:46,596 INFO hive.metastore.hivemetastoressimpl: deleting hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table 2014-06-23 14:02:46,694 INFO hive.metastore.hivemetastoressimpl: Moved to trash: hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table {code} On our client we get a socket timeout for the drop table call at 14:02:45: {code} 2:02:45.209 PM WARN com.cloudera.cmon.firehose.polling.hive.HiveMetastoreCanary Metastore HIVE-1-HIVEMETASTORE-627a77825bb851bf2db30317a698dded: Failed to drop table com.cloudera.cmf.cdhclient.common.hive.MetaException: java.net.SocketTimeoutException: Read timed out {code} we then try to drop the database immediately afterwards and the next message in our logs is: {code} 2:02:46.697 PM WARN com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl Could not drop hive database: cloudera_manager_metastore_canary_test_db com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TApplicationException: get_database failed: out of sequence response at com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:479) at com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl.dropDatabase(MetastoreClientImpl.java:160) {code} Note that the moved-to-trash message in the hive metastore is from 14:02:46,694 and the out-of-order exception is from 2:02:46.697. I know that order-in-time does not imply causation but is it possible that we are getting the drop-table acknowledgment message instead of the get_database message? > out of sequence error in HiveMetastore server > --------------------------------------------- > > Key: HIVE-6893 > URL: https://issues.apache.org/jira/browse/HIVE-6893 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 0.12.0 > Reporter: Romain Rigaux > Assignee: Naveen Gangam > Fix For: 0.14.0 > > Attachments: HIVE-6893.1.patch > > > Calls listing databases or tables fail. It seems to be a concurrency problem. > {code} > 014-03-06 05:34:00,785 ERROR hive.log: > org.apache.thrift.TApplicationException: get_databases failed: out of > sequence response > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:472) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:459) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:648) > at > org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:66) > at > org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:278) > at sun.reflect.GeneratedMethodAccessor323.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:582) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57) > at com.sun.proxy.$Proxy9.getSchemas(Unknown Source) > at > org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:192) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:263) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1433) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1418) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.cli.thrift.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)