[ 
https://issues.apache.org/jira/browse/HIVE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041510#comment-14041510
 ] 

Gilad Wolff commented on HIVE-6893:
-----------------------------------

I encountered the same issue, we get a socket read timeout and then 
out-of-sequence error. In one case we got an OOM in our client and I suspect 
it's the same underlying issue. Here is the metastore sequence of events. Our 
client tried to drop a table starting at 14:02:25. Note that we use a 20 
seconds timeout for our client:
{code}
2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 
11: source:/10.20.93.47 drop_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,181 INFO 
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue      
ip=/10.20.93.47 cmd=source:/10.20.93.47 drop_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE     
2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 
11: source:/10.20.93.47 get_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,182 INFO 
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue      
ip=/10.20.93.47 cmd=source:/10.20.93.47 get_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE      
2014-06-23 14:02:46,596 INFO hive.metastore.hivemetastoressimpl: deleting  
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
2014-06-23 14:02:46,694 INFO hive.metastore.hivemetastoressimpl: Moved to 
trash: 
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
{code}

On our client we get a socket timeout for the drop table call at 14:02:45:
{code}
2:02:45.209 PM  WARN    
com.cloudera.cmon.firehose.polling.hive.HiveMetastoreCanary Metastore 
HIVE-1-HIVEMETASTORE-627a77825bb851bf2db30317a698dded: Failed to drop table 
com.cloudera.cmf.cdhclient.common.hive.MetaException: 
java.net.SocketTimeoutException: Read timed out
{code}
we then try to drop the database immediately afterwards and the next message in 
our logs is:
{code}
2:02:46.697 PM  WARN    com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl    
Could not drop hive database: cloudera_manager_metastore_canary_test_db
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TApplicationException: 
get_database failed: out of sequence response
        at 
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:479)
        at 
com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl.dropDatabase(MetastoreClientImpl.java:160)
{code}

Note that the moved-to-trash message in the hive metastore is from 14:02:46,694 
and the out-of-order exception is from 2:02:46.697. I know that order-in-time 
does not imply causation but is it possible that we are getting the drop-table 
acknowledgment message instead of the get_database message?

> out of sequence error in HiveMetastore server
> ---------------------------------------------
>
>                 Key: HIVE-6893
>                 URL: https://issues.apache.org/jira/browse/HIVE-6893
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.12.0
>            Reporter: Romain Rigaux
>            Assignee: Naveen Gangam
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6893.1.patch
>
>
> Calls listing databases or tables fail. It seems to be a concurrency problem.
> {code}
> 014-03-06 05:34:00,785 ERROR hive.log: 
> org.apache.thrift.TApplicationException: get_databases failed: out of 
> sequence response
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:472)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:459)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:648)
>         at 
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:66)
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:278)
>         at sun.reflect.GeneratedMethodAccessor323.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:582)
>         at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57)
>         at com.sun.proxy.$Proxy9.getSchemas(Unknown Source)
>         at 
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:192)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:263)
>         at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1433)
>         at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1418)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at 
> org.apache.hive.service.cli.thrift.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to