Which method is HBaseStorage connecting to Zookeeper in? There are methods that are guaranteed to only be called on the front or back end, so you can avoid these kinds of traps. HCatalog in particular had to work around the same thing to avoid connecting to the metastore server in places it should not. Do you need to connect to ZooKeeper on the front end at all?
Alan. On Oct 28, 2011, at 9:19 AM, Vincent Barat wrote: > :-( > > ok I understand. I don't see why the optimizer need to instantiate the > loaders, but I'm not a PIG specialist. > > Concerning HBaseStorage, this is a big issue, as the number of cnx to > zookeeper is fairly limited (30 in the default configuration). > So a simple join of 2 tables would break this limit. > > The finalize() method is of course not a solution, as it is called only when > the GC want to call it. > > Anyway I'll try it just to get some insight. > > Cheers, > > Le 26/10/11 21:12, Dmitriy Ryaboy a écrit : >> unfortunately Pig creates UDFs willy-nilly to do various checks during >> script compilation. >> >> Maybe we should override the finalize() method and close the >> connection there, that would help. >> >> D >> >> On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat<[email protected]> >> wrote: >>> Hi, >>> >>> I try to figure out why PIG is using so many zookeeper connections (from the >>> frontend machine) when using HBaseStorage(). >>> >>> I added a trace in the constructor of HBaseStorage() >>> >>> I wrote a simple script loading an HBase table: >>> >>> sessions = LOAD 'hbase://mytable' USING >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp') >>> AS (sid:chararray, start:long); >>> dump sessions; >>> >>> When I run the script: >>> >>> vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig >>> 2011-10-25 11:32:41,482 [main] INFO org.apache.pig.Main - Logging error >>> messages to: /Users/vbarat/pig_1319535161481.log >>> 2011-10-25 11:32:41,563 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting >>> to hadoop file system at: file:/// >>> 2011-10-25 11:32:41,884 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:41,970 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,035 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,073 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,184 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,207 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,233 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,256 [main] INFO >>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >>> script: UNKNOWN >>> 2011-10-25 11:32:42,317 [main] INFO >>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with >>> processName=JobTracker, sessionId= >>> 2011-10-25 11:32:42,374 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,391 [main] INFO >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE >>> ********************* >>> 2011-10-25 11:32:42,425 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - >>> File concatenation threshold: 100 optimistic? false >>> 2011-10-25 11:32:42,449 [main] INFO >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>> - MR plan size before optimization: 1 >>> >>> So HBaseStorage is create 10 times (and so the table is opened 9 times). >>> >>> I'd like to konw why so many creation ? >>> >>> Also, when I change my script to load 2 tables and join them, the >>> HBaseStorage object is created 40 times ! >>> >>> Can someone give me some insight to help me investigating the issue ? >>> >>> Thanks a lot >>>
