Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT jars (from https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/) with exactly the same issue. That could possibly indicate that problem may be related to the actual hive-metastore component and the way it interacts with metastore, thoughts?
On 27 August 2013 18:41, Michał Czerwiński <[email protected]> wrote: > In PIG I am doing query like this: > > sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader; > sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2'; > ll = LIMIT sdp 100; > dump ll; > > and hcatalog starts talking for few minutes to mysql asking for metadata, > in the meantime after few seconds pig > does: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > > Number of partitions I have: > hive -e 'use db1; show partitions table1' |wc -l > Time taken: 1.467 seconds > 37748 > > When I run the same query on a different environment where I have only > ~1000 partitions all works fine. > > Also problem does not exist on cdh3 and hcatalog-0.4.0. > > In hcatalog's logs I can see: > (note the timestamp, I run the query at 17:10:45,216) > > 2013-08-27 17:10:46,275 INFO DataNucleus.MetaData > (Log4JLogger.java:info(77)) - Listener found initialisation for persistable > class org.apache.hadoop.hive.metastore.model.MPartition > > 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore > (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all > objects for listMPartitionsByFilter > > 2013-08-27 17:22:32,410 INFO metastore.ObjectStore > (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning = > 37748 > > After that the hcatalog continues to: > 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction > (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms > > Please note that I have datanucleus set to DEBUG and that slows things > down significantly, without that, it still takes around 7 minutes for > hcatalog to settle. > > Also datanucleus settings from the hcatalog's logs: > > datanucleus.autoStartMechanismMode = checked > javax.jdo.option.Multithreaded = true > datanucleus.identifierFactory = datanucleus > datanucleus.transactionIsolation = read > datanucleus.validateTables = false > javax.jdo.option.ConnectionURL = jdbc:mysql://XXX > javax.jdo.option.DetachAllOnCommit = true > javax.jdo.option.NonTransactionalRead = true > datanucleus.validateConstraints = false > javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver > javax.jdo.option.ConnectionUserName = hive > datanucleus.validateColumns = false > datanucleus.cache.level2 = false > datanucleus.plugin.pluginRegistryBundleCheck = LOG > datanucleus.cache.level2.type = none > javax.jdo.PersistenceManagerFactoryClass = > org.datanucleus.jdo.JDOPersistenceManagerFactory > datanucleus.autoCreateSchema = true > datanucleus.storeManagerType = rdbms > datanucleus.connectionPoolingType = DBCP > > This runs on CDH4 4.3.0 > hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0 > > Ideas? >
