unfortunately Pig creates UDFs willy-nilly to do various checks during script compilation.
Maybe we should override the finalize() method and close the connection there, that would help. D On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat <[email protected]> wrote: > Hi, > > I try to figure out why PIG is using so many zookeeper connections (from the > frontend machine) when using HBaseStorage(). > > I added a trace in the constructor of HBaseStorage() > > I wrote a simple script loading an HBase table: > > sessions = LOAD 'hbase://mytable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp') > AS (sid:chararray, start:long); > dump sessions; > > When I run the script: > > vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig > 2011-10-25 11:32:41,482 [main] INFO org.apache.pig.Main - Logging error > messages to: /Users/vbarat/pig_1319535161481.log > 2011-10-25 11:32:41,563 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: file:/// > 2011-10-25 11:32:41,884 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:41,970 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,035 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,073 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,184 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,207 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,233 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,256 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the > script: UNKNOWN > 2011-10-25 11:32:42,317 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with > processName=JobTracker, sessionId= > 2011-10-25 11:32:42,374 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,391 [main] INFO > org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE > ********************* > 2011-10-25 11:32:42,425 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > File concatenation threshold: 100 optimistic? false > 2011-10-25 11:32:42,449 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > > So HBaseStorage is create 10 times (and so the table is opened 9 times). > > I'd like to konw why so many creation ? > > Also, when I change my script to load 2 tables and join them, the > HBaseStorage object is created 40 times ! > > Can someone give me some insight to help me investigating the issue ? > > Thanks a lot >
