unfortunately Pig creates UDFs willy-nilly to do various checks during
script compilation.

Maybe we should override the finalize() method and close the
connection there, that would help.

D

On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat <[email protected]> wrote:
> Hi,
>
> I try to figure out why PIG is using so many zookeeper connections (from the
> frontend machine) when using HBaseStorage().
>
> I added a trace in the constructor of HBaseStorage()
>
> I wrote a simple script loading an HBase table:
>
> sessions = LOAD 'hbase://mytable' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp')
> AS (sid:chararray, start:long);
> dump sessions;
>
> When I run the script:
>
> vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig
> 2011-10-25 11:32:41,482 [main] INFO  org.apache.pig.Main - Logging error
> messages to: /Users/vbarat/pig_1319535161481.log
> 2011-10-25 11:32:41,563 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: file:///
> 2011-10-25 11:32:41,884 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:41,970 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,035 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,073 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,184 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,207 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,233 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,256 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2011-10-25 11:32:42,317 [main] INFO
>  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2011-10-25 11:32:42,374 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,391 [main] INFO
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
> *********************
> 2011-10-25 11:32:42,425 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2011-10-25 11:32:42,449 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
>
> So HBaseStorage is create 10 times (and so the table is opened 9 times).
>
> I'd like to konw why so many creation ?
>
> Also, when I change my script to load 2 tables and join them, the
> HBaseStorage object is created 40 times !
>
> Can someone give me some insight to help me investigating the issue ?
>
> Thanks a lot
>

Reply via email to