We need to connect to HBase to figure out input slices, etc. This
involves connecting to ZK (via HBase innards -- we don't do this
explicitly).

On Fri, Oct 28, 2011 at 9:30 AM, Alan Gates <[email protected]> wrote:
> Which method is HBaseStorage connecting to Zookeeper in?  There are methods 
> that are guaranteed to only be called on the front or back end, so you can 
> avoid these kinds of traps.  HCatalog in particular had to work around the 
> same thing to avoid connecting to the metastore server in places it should 
> not.  Do you need to connect to ZooKeeper on the front end at all?
>
> Alan.
>
> On Oct 28, 2011, at 9:19 AM, Vincent Barat wrote:
>
>> :-(
>>
>> ok I understand. I don't see why the optimizer need to instantiate the 
>> loaders, but I'm not a PIG specialist.
>>
>> Concerning HBaseStorage, this is a big issue, as the number of cnx to 
>> zookeeper is fairly limited (30 in the default configuration).
>> So a simple join of 2 tables would break this limit.
>>
>> The finalize() method is of course not a solution, as it is called only when 
>> the GC want to call it.
>>
>> Anyway I'll try it just to get some insight.
>>
>> Cheers,
>>
>> Le 26/10/11 21:12, Dmitriy Ryaboy a écrit :
>>> unfortunately Pig creates UDFs willy-nilly to do various checks during
>>> script compilation.
>>>
>>> Maybe we should override the finalize() method and close the
>>> connection there, that would help.
>>>
>>> D
>>>
>>> On Tue, Oct 25, 2011 at 2:49 AM, Vincent Barat<[email protected]>  
>>> wrote:
>>>> Hi,
>>>>
>>>> I try to figure out why PIG is using so many zookeeper connections (from 
>>>> the
>>>> frontend machine) when using HBaseStorage().
>>>>
>>>> I added a trace in the constructor of HBaseStorage()
>>>>
>>>> I wrote a simple script loading an HBase table:
>>>>
>>>> sessions = LOAD 'hbase://mytable' USING
>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp')
>>>> AS (sid:chararray, start:long);
>>>> dump sessions;
>>>>
>>>> When I run the script:
>>>>
>>>> vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig
>>>> 2011-10-25 11:32:41,482 [main] INFO  org.apache.pig.Main - Logging error
>>>> messages to: /Users/vbarat/pig_1319535161481.log
>>>> 2011-10-25 11:32:41,563 [main] INFO
>>>>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
>>>> Connecting
>>>> to hadoop file system at: file:///
>>>> 2011-10-25 11:32:41,884 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:41,970 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,035 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,073 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,184 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,207 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,233 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,256 [main] INFO
>>>>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>>>> script: UNKNOWN
>>>> 2011-10-25 11:32:42,317 [main] INFO
>>>>  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
>>>> processName=JobTracker, sessionId=
>>>> 2011-10-25 11:32:42,374 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,391 [main] INFO
>>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** 
>>>> HBASESTORAGE
>>>> *********************
>>>> 2011-10-25 11:32:42,425 [main] INFO
>>>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>>>> File concatenation threshold: 100 optimistic? false
>>>> 2011-10-25 11:32:42,449 [main] INFO
>>>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>> - MR plan size before optimization: 1
>>>>
>>>> So HBaseStorage is create 10 times (and so the table is opened 9 times).
>>>>
>>>> I'd like to konw why so many creation ?
>>>>
>>>> Also, when I change my script to load 2 tables and join them, the
>>>> HBaseStorage object is created 40 times !
>>>>
>>>> Can someone give me some insight to help me investigating the issue ?
>>>>
>>>> Thanks a lot
>>>>
>
>

Reply via email to