:-(

ok I understand. I don't see why the optimizer need to instantiate the loaders, but I'm not a PIG specialist.

Concerning HBaseStorage, this is a big issue, as the number of cnx to zookeeper is fairly limited (30 in the default configuration).
So a simple join of 2 tables would break this limit.

The finalize() method is of course not a solution, as it is called only when the GC want to call it.

Anyway I'll try it just to get some insight.

Cheers,

Le 26/10/11 21:12, Dmitriy Ryaboy a écrit :
unfortunately Pig creates UDFs willy-nilly to do various checks during
script compilation.

Maybe we should override the finalize() method and close the
connection there, that would help.

D

On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat<[email protected]>  wrote:
Hi,

I try to figure out why PIG is using so many zookeeper connections (from the
frontend machine) when using HBaseStorage().

I added a trace in the constructor of HBaseStorage()

I wrote a simple script loading an HBase table:

sessions = LOAD 'hbase://mytable' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp')
AS (sid:chararray, start:long);
dump sessions;

When I run the script:

vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig
2011-10-25 11:32:41,482 [main] INFO  org.apache.pig.Main - Logging error
messages to: /Users/vbarat/pig_1319535161481.log
2011-10-25 11:32:41,563 [main] INFO
  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///
2011-10-25 11:32:41,884 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:41,970 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,035 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,073 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,184 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,207 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,233 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,256 [main] INFO
  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-10-25 11:32:42,317 [main] INFO
  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2011-10-25 11:32:42,374 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,391 [main] INFO
  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE
*********************
2011-10-25 11:32:42,425 [main] INFO
  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2011-10-25 11:32:42,449 [main] INFO
  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1

So HBaseStorage is create 10 times (and so the table is opened 9 times).

I'd like to konw why so many creation ?

Also, when I change my script to load 2 tables and join them, the
HBaseStorage object is created 40 times !

Can someone give me some insight to help me investigating the issue ?

Thanks a lot

Reply via email to