Re: Pig HBase integration

2014-09-29 Thread Krishna Kalyan
Thank you so much Serega. Regards, Krishna On Sun, Sep 28, 2014 at 11:01 PM, Serega Sheypak serega.shey...@gmail.com wrote: https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html I'm not sure how does Pig HBaseStroage works. I suppose it would read all

Re: Pig HBase integration

2014-09-28 Thread Krishna Kalyan
Thanks Serega, Our usecase details: We have a location table which will be stored in HBase with locationID as the rowkey / Joinkey. We intend to join this table with a transactional WebLog file in HDFS (Expected size can be around 2TB). Joining query will be passed from Pig. Can we expect a

Re: Pig HBase integration

2014-09-28 Thread Serega Sheypak
store location to hdfs store weblog to hdfs join them use HBase bulk load tool to load join result to hbase. What's the reason to keep location dataset in hbase and weblogs in hdfs? You can expect data load perfomance improvement. For me it takes few minutes to bulk load 500.000.000 records to

Re: Pig HBase integration

2014-09-28 Thread Krishna Kalyan
We actually have 2 data sets in HDFS, location (3-5 GB, approx 10 columns in each record) and weblog (2-3 TB, approx 50 columns in each record). We need to join the data sets using the locationId, which is in both the data-sets. We have 2 options: 1. Have both the data-sets in HDFS only and JOIN

Re: Pig HBase integration

2014-09-28 Thread Serega Sheypak
https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html I'm not sure how does Pig HBaseStroage works. I suppose it would read all data and then join it as usual dataset. So you should get serious hbase perfomace degradation during read, you would get

Re: Pig HBase integration

2014-09-27 Thread Serega Sheypak
Depends on the datasets size and HBase workload. The best way is to do join in pig, store it and then use HBase bulk load tool. It's general recommendation. I have no idea about your task details 2014-09-27 7:32 GMT+04:00 Krishna Kalyan krishnakaly...@gmail.com: Hi, We have a use case that

Pig HBase integration

2014-09-26 Thread Krishna Kalyan
Hi, We have a use case that involves ETL on data coming from several different sources using pig. We plan to store the final output table in HBase. What will be the performance impact if we do a join with an external CSV table using pig?. Regards, Krishna

Re: Pig + Hbase integration

2012-10-29 Thread Jean-Daniel Cryans
On Thu, Oct 25, 2012 at 7:44 AM, Manu S manupk...@gmail.com wrote: Hi, I am using Pig-0.10.0 hbase-0.94.2. I am trying to store the processed output to Hbase cluster using pig script. I registered the required .jar and set the mapreduce and zookeeper parameters within the script itself.

Re: LeaseException while extracting data via pig/hbase integration

2012-02-16 Thread Andrew Purtell
) - Original Message - From: Mikael Sitruk mikael.sit...@gmail.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Wednesday, February 15, 2012 11:32 PM Subject: Re: LeaseException while extracting data via pig/hbase integration Andy hi Not sure what you mean by Does

Re: LeaseException while extracting data via pig/hbase integration

2012-02-16 Thread Mikael Sitruk
Sitruk mikael.sit...@gmail.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Wednesday, February 15, 2012 11:32 PM Subject: Re: LeaseException while extracting data via pig/hbase integration Andy hi Not sure what you mean by Does something like the below help

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Jean-Daniel Cryans
You would have to grep the lease's id, in your first email it was -7220618182832784549. About the time it takes to process each row, I meant client (pig) side not in the RS. J-D On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Please see answer inline Thanks

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
Ok, I don't have this log anymore but since the problem was reproduced in other log (which i keep), here is the grep 2012-02-08 14:13:02,970 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-6992210222685255354' does not exist

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Andrew Purtell
) - Original Message - From: Jean-Daniel Cryans jdcry...@apache.org To: user@hbase.apache.org Cc: Sent: Wednesday, February 15, 2012 10:17 AM Subject: Re: LeaseException while extracting data via pig/hbase integration You would have to grep the lease's id, in your first email

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
@hbase.apache.org Cc: Sent: Wednesday, February 15, 2012 10:17 AM Subject: Re: LeaseException while extracting data via pig/hbase integration You would have to grep the lease's id, in your first email it was -7220618182832784549. About the time it takes to process each row, I meant client

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Mikael Sitruk
hi, Well no, i can't figure out what is the problem, but i saw that someone else had the same problem (see email: LeaseException despite high hbase.regionserver.lease.period) What can i tell is the following: Last week the problem was consistent 1. I updated hbase.regionserver.lease.period=30

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Jean-Daniel Cryans
On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk mikael.sit...@gmail.com wrote: hi, Well no, i can't figure out what is the problem, but i saw that someone else had the same problem (see email: LeaseException despite high hbase.regionserver.lease.period) What can i tell is the following: Last

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Mikael Sitruk
Please see answer inline Thanks Mikael.S On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk mikael.sit...@gmail.com wrote: hi, Well no, i can't figure out what is the problem, but i saw that someone else had the

Re: LeaseException while extracting data via pig/hbase integration

2012-02-13 Thread Jean-Daniel Cryans
Late answer, did you figure it out? This exception happens when you don't use your scanner lease for more than the lease time (default one minute). AFAIK that didn't change, so maybe something else got slow? Or maybe some special configurations you had didn't make it during the upgrade? J-D On

LeaseException while extracting data via pig/hbase integration

2012-02-06 Thread Mikael Sitruk
Hi all Recently I have upgraded my cluster from Hbase 0.90.1 to 0.90.4 (using cloudera from cdh3u0 to cdh3u2) Everything was ok till I ran pig extract on the new cluster, from the old cluster everything worked well. Now each time i run the extract in conjunction to other work performed on the