Re: Determining regions with low HDFS locality index

2014-12-27 Thread Rahul Ravindran
need to investigate. -- Lars       From: Rahul Ravindran rahu...@yahoo.com.INVALID To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, December 25, 2014 11:37 PM Subject: Determining regions with low HDFS locality index   Hi,   When an Hbase RS goes down(possibly because

Determining regions with low HDFS locality index

2014-12-25 Thread Rahul Ravindran
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that machine to other Region Servers. However, since the new region servers do not have the backing HFiles, data locality for the newly transitioned regions is not great and hence some of our

Experience with HBASE-8283 and lots of small hfile

2014-04-03 Thread Rahul Ravindran
Hi,    We are currently on 0.94.2(CDH 4.2.1) and would likely upgrade to 0.94.15 (CDH 4.6) primarily to use the above fix. We have turned off automatic major compactions. We load data into an hbase table every 2 minutes. Currently, we are not using bulk load since it created compaction issues.

Downside of too many HFiles

2013-06-12 Thread Rahul Ravindran
Hello, I am trying to understand the downsides of having a large number of hfiles by having a large hbase.hstore.compactionThreshold   This delays major compaction. However, the amount of data that needs to be read and re-written as a single hfile during major compaction will remain the same

Re: Scan + Gets are disk bound

2013-06-05 Thread Rahul Ravindran
hook for an earlier version of the row?  Thanks, ~Rahul. From: Asaf Mesika asaf.mes...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com Sent: Tuesday, June 4, 2013 10:51 PM Subject: Re: Scan + Gets are disk bound

Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Hi, We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
hotspotting. ~Rahul. From: anil gupta anilgupt...@gmail.com To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com Sent: Tuesday, June 4, 2013 9:31 PM Subject: Re: Scan + Gets are disk bound On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
. From: Anoop John anoop.hb...@gmail.com To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com Cc: anil gupta anilgupt...@gmail.com Sent: Tuesday, June 4, 2013 10:44 PM Subject: Re: Scan + Gets are disk bound When you set time range on Scan, some files can get skipped

Re: Using HBase for Deduping

2013-02-19 Thread Rahul Ravindran
From: Michael Segel michael_se...@hotmail.com To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com Sent: Friday, February 15, 2013 9:24 AM Subject: Re: Using HBase for Deduping Interesting.  Surround with a Try Catch?  But it sounds like you're on the right path.  Happy

Re: Using HBase for Deduping

2013-02-15 Thread Rahul Ravindran
...@hotmail.com To: user@hbase.apache.org Cc: Rahul Ravindran rahu...@yahoo.com Sent: Friday, February 15, 2013 4:36 AM Subject: Re: Using HBase for Deduping On Feb 15, 2013, at 3:07 AM, Asaf Mesika asaf.mes...@gmail.com wrote: Michael, this means read for every write? Yes

Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Hi,    We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
From: Rahul Ravindran Sent: 2/14/2013 11:41 AM To: user@hbase.apache.org Subject: Using HBase for Deduping Hi, We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job

Using Hbase for Dedupping

2013-02-14 Thread Rahul Ravindran
Hi,    We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
We can't rely on the the assumption event dupes will not dupe outside an hour boundary. So, your take is that, doing a lookup per event within the MR job is going to be bad? From: Viral Bajaria viral.baja...@gmail.com To: Rahul Ravindran rahu...@yahoo.com Cc

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
and forth, we can take it off list too and summarize the conversation for the list. On Thu, Feb 14, 2013 at 1:07 PM, Rahul Ravindran rahu...@yahoo.com wrote: We can't rely on the the assumption event dupes will not dupe outside an hour boundary. So, your take is that, doing a lookup per