We need to connect to HBase to figure out input slices, etc. This involves connecting to ZK (via HBase innards -- we don't do this explicitly).
On Fri, Oct 28, 2011 at 9:30 AM, Alan Gates <[email protected]> wrote: > Which method is HBaseStorage connecting to Zookeeper in? There are methods > that are guaranteed to only be called on the front or back end, so you can > avoid these kinds of traps. HCatalog in particular had to work around the > same thing to avoid connecting to the metastore server in places it should > not. Do you need to connect to ZooKeeper on the front end at all? > > Alan. > > On Oct 28, 2011, at 9:19 AM, Vincent Barat wrote: > >> :-( >> >> ok I understand. I don't see why the optimizer need to instantiate the >> loaders, but I'm not a PIG specialist. >> >> Concerning HBaseStorage, this is a big issue, as the number of cnx to >> zookeeper is fairly limited (30 in the default configuration). >> So a simple join of 2 tables would break this limit. >> >> The finalize() method is of course not a solution, as it is called only when >> the GC want to call it. >> >> Anyway I'll try it just to get some insight. >> >> Cheers, >> >> Le 26/10/11 21:12, Dmitriy Ryaboy a écrit : >>> unfortunately Pig creates UDFs willy-nilly to do various checks during >>> script compilation. >>> >>> Maybe we should override the finalize() method and close the >>> connection there, that would help. >>> >>> D >>> >>> On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat<[email protected]> >>> wrote: >>>> Hi, >>>> >>>> I try to figure out why PIG is using so many zookeeper connections (from >>>> the >>>> frontend machine) when using HBaseStorage(). >>>> >>>> I added a trace in the constructor of HBaseStorage() >>>> >>>> I wrote a simple script loading an HBase table: >>>> >>>> sessions = LOAD 'hbase://mytable' USING >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp') >>>> AS (sid:chararray, start:long); >>>> dump sessions; >>>> >>>> When I run the script: >>>> >>>> vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig >>>> 2011-10-25 11:32:41,482 [main] INFO org.apache.pig.Main - Logging error >>>> messages to: /Users/vbarat/pig_1319535161481.log >>>> 2011-10-25 11:32:41,563 [main] INFO >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>>> Connecting >>>> to hadoop file system at: file:/// >>>> 2011-10-25 11:32:41,884 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:41,970 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,035 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,073 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,184 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,207 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,233 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,256 [main] INFO >>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >>>> script: UNKNOWN >>>> 2011-10-25 11:32:42,317 [main] INFO >>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with >>>> processName=JobTracker, sessionId= >>>> 2011-10-25 11:32:42,374 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,391 [main] INFO >>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** >>>> HBASESTORAGE >>>> ********************* >>>> 2011-10-25 11:32:42,425 [main] INFO >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - >>>> File concatenation threshold: 100 optimistic? false >>>> 2011-10-25 11:32:42,449 [main] INFO >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>> - MR plan size before optimization: 1 >>>> >>>> So HBaseStorage is create 10 times (and so the table is opened 9 times). >>>> >>>> I'd like to konw why so many creation ? >>>> >>>> Also, when I change my script to load 2 tables and join them, the >>>> HBaseStorage object is created 40 times ! >>>> >>>> Can someone give me some insight to help me investigating the issue ? >>>> >>>> Thanks a lot >>>> > >
