Sarath, I assume that the failure you are seeing doesn't happen immediately? The current timeout on the client is set to 5 minutes. A socket timeout usually means that the client timed out before it can even get a response from the server. So the server could either be very busy doing something if you are pulling in tons of data and/or you might also be running into this bug[1] where the HMS connections are leaked.
To start with, can you try bumping up the timeout to like 20 minutes and see if your queries succeed? This can be done directly from cli via "set hive.metastore.client.socket.timeout=1200". [1] https://issues.apache.org/jira/browse/HIVE-10956 On Fri, Aug 7, 2015 at 8:13 AM, Sarath Chandra < sarathchandra.jos...@algofusiontech.com> wrote: > Thanks Eugene, Alan. > > @Alan, > As suggested checked the logs, here is what I found - > > - On starting metastore server, I'm seeing following messages in the > log file - > > *2015-08-07 18:32:56,678 ERROR [Thread-7]: compactor.Initiator > (Initiator.java:run(134)) - Caught an exception in the main loop of > compactor initiator, exiting MetaException(message:Unable to get jdbc > connection from pool, READ_COMMITTED and SERIALIZABLE are the only valid > transaction levels)* > * at > org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:811)* > * at > org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.revokeFromLocalWorkers(CompactionTxnHandler.java:443)* > * at > org.apache.hadoop.hive.ql.txn.compactor.Initiator.recoverFailedCompactions(Initiator.java:147)* > * at > org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:64)* > > - On bringing up the hive shell, I get the following messages - > > tion - enable connectionWatch for additional debugging assistance or set > disableConnectionTracking to true to disable this feature entirely. > 2015-08-07 18:38:51,614 WARN > [org.spark-project.guava.common.base.internal.Finalizer]: > bonecp.ConnectionPartition (ConnectionPartition.java:finalizeReferent(162)) > - BoneCP detected an unclosed connection and will now attempt to close it > for you. You should be closing this connection in your application - enable > connectionWatch for additional debugging assistance or set > disableConnectionTracking to true to disable this feature entirely. > 2015-08-07 18:38:51,768 DEBUG [pool-3-thread-1]: metastore.ObjectStore > (ObjectStore.java:debugLog(6435)) - Commit transaction: count = 0, isactive > true at: > > org.apache.hadoop.hive.metastore.ObjectStore.getFunctions(ObjectStore.java:6657) > > - On firing "show tables" command, I get the following messages in the > log file - > > 2015-08-07 18:41:02,511 INFO [main]: hive.metastore > (HiveMetaStoreClient.java:open(297)) - Trying to connect to metastore with > URI thrift://sarath:9083 > 2015-08-07 18:41:02,511 INFO [main]: hive.metastore > (HiveMetaStoreClient.java:open(385)) - Connected to metastore. > 2015-08-07 18:41:22,549 ERROR [main]: ql.Driver > (SessionState.java:printError(545)) - FAILED: Error in determing valid > transactions: Error communicating with the metastore > org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with > the metastore > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getValidTxns(DbTxnManager.java:281) > at > org.apache.hadoop.hive.ql.Driver.recordValidTxns(Driver.java:842) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1036) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_open_txns(ThriftHiveMetastore.java:3367) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_open_txns(ThriftHiveMetastore.java:3355) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getValidTxns(HiveMetaStoreClient.java:1545) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getValidTxns(DbTxnManager.java:279) > ... 15 more > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 24 more > > Let me know if there is anything to be taken care in the configuration or > setup. > > On Wed, Aug 5, 2015 at 4:40 AM, Alan Gates <alanfga...@gmail.com> wrote: > >> Ok, the next step is to look at the logs from your Hive metastore server >> and see exactly what's happening. The error you're seeing is from the >> client. On your metastore server there should be logs with the same >> timestamp giving details on why the transaction operation failed. >> >> Alan. >> >> Sarath Chandra <sarathchandra.jos...@algofusiontech.com> >> August 3, 2015 at 20:02 >> Thanks Alan. >> >> Yes I've run metastore scripts for oracle instance. Infact I've removed >> my previous metastore and created a fresh one by running the schema >> creation script for 1.2.1. I've looked into the new schema and able to see >> the table TXNS. I've also removed the hdfs location "/user/hive/warehouse" >> and created a fresh one. >> >> But still I'm facing this issue. >> >> >> >> Alan Gates <alanfga...@gmail.com> >> August 3, 2015 at 8:29 >> Did you run the hive metastore upgrade scripts for your oracle instance? >> This error message usually means the transaction related tables have not >> been created in your database. Somewhere in your distribution there should >> be a set of upgrade scripts. Look for scripts of the form: >> >> scripts/metastore/upgrade/oracle/upgrade-0.13.0-to-0.14.0.oracle.sql >> >> You'll want to run all of the ones from 0.13 to 1.2 (0.13->0.14, >> 0.14->1.1, 1.1->1.2). The 0.13->0.14 scripts assume that you added the >> transaction tables as part of upgrading to Hive 0.13. If you did not you >> will need to first run hive-txn-schema-0.13.0.oracle.sql which will create >> the initial transaction tables. You can determine whether this was done by >> looking for a table named TXNS in the hive schema on your Oracle db. >> >> Alan. >> >> Sarath Chandra <sarathchandra.jos...@algofusiontech.com> >> August 3, 2015 at 6:29 >> Hi All, >> >> Earlier I was using hive 0.13.0 and now trying to migrate to latest >> version to utilize the transaction support introduced from hive 0.14.0. >> >> I downloaded hive 1.2.1, created a metastore in oracle database and >> provided all the required configuration parameters in conf/hive-site.xml to >> enable transactions. For the parameter "hive.txn.manager" given the value >> "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager". >> >> From the hive prompt when I fire the command "show tables;" I'm getting >> the below exception - >> *FAILED: Error in determining valid transactions: Error communicating >> with the metastore* >> >> But if disable the "hive.txn.manager" parameter in hive-site.xml then the >> command works fine. >> >> Is there anything else to be configured which I'm missing? >> >> Thanks & Regards, >> Sarath. >> >> > -- Swarnim