Narendra, Since I didn't see the client logs , FullGC is one probably reason I suspect. No matter it happens in client side or server side. So I suggest to check the GC log (Open the client GC log both at server and client side) to see whether FullGC happens with a high frequency, and check the stop-the-world pause time.
I don't think parallel scanners is the problem. Jieshan -----Original Message----- From: Narendra yadala [mailto:narendra.yad...@gmail.com] Sent: Thursday, April 19, 2012 11:24 PM To: user@hbase.apache.org Subject: Re: HBase parallel scanner performance Hi Jieshan HBase version : Version 0.90.4-cdh3u3 Size of Key Value pair should not be more than 2KB I changed the GC parameters at the server side. I have not looked into GC logs yet but I have noticed that it pausing the batch process every now and then. How do I look at the server GC logs? Thanks Narendra On Thu, Apr 19, 2012 at 7:46 PM, Bijieshan <bijies...@huawei.com> wrote: > Hi Narendra, > > I have a few doubts: > > 1. Which version you are using? > 2. What's the size of each KeyValue? > 3. Did you change the GC parameters in client side or server side? After > changing the GC parameters, did you keep an eye on the GC logs? > > Thank you. > > Regards, > Jieshan > > -----Original Message----- > From: Narendra yadala [mailto:narendra.yad...@gmail.com] > Sent: Thursday, April 19, 2012 8:04 PM > To: user@hbase.apache.org > Subject: Re: HBase parallel scanner performance > > Hi Michel > > Yes, that is exactly what I do in step 2. I am aware of the reason for the > scanner timeout exceptions. It is the time between two consecutive > invocations of the next call on a specific scanner object. I increased the > scanner timeout to 10 min on the region server and still I keep seeing the > timeouts. So I reduced my scanner cache to 128. > > Full table scan takes 130 seconds and there are 2.2 million rows in the > table as of now. Each row is around 2 KB in size. I measured time for the > full table scan by issuing `count` command from the hbase shell. > > I kind of understood the fix that you are specifying, but do I need to > change the table structure to fix this problem? All I do is a n^2 operation > and even that fails with 10 different types of exceptions. It is mildly > annoying that I need to know all the low level storage details of HBase to > do such a simple operation. And this is happening for just 14 parallel > scanners. I am wondering what would happen when there are thousands of > parallel scanners. > > Please let me know if there is any configuration param change which would > fix this issue. > > Thanks a lot > Narendra > > On Thu, Apr 19, 2012 at 4:40 PM, Michel Segel <michael_se...@hotmail.com > >wrote: > > > So in your step 2 you have the following: > > FOREACH row IN TABLE alpha: > > SELECT something > > FROM TABLE alpha > > WHERE alpha.url = row.url > > > > Right? > > And you are wondering why you are getting timeouts? > > ... > > ... > > And how long does it take to do a full table scan? ;-) > > (there's more, but that's the first thing you should see...) > > > > Try creating a second table where you invert the URL and key pair such > > that for each URL, you have a set of your alpha table's keys? > > > > Then you have the following... > > FOREACH row IN TABLE alpha: > > FETCH key-set FROM beta > > WHERE beta.rowkey = alpha.url > > > > Note I use FETCH to signify that you should get a single row in response. > > > > Does this make sense? > > ( your second table is actually and index of the URL column in your first > > table) > > > > HTH > > > > Sent from a remote device. Please excuse any typos... > > > > Mike Segel > > > > On Apr 19, 2012, at 5:43 AM, Narendra yadala <narendra.yad...@gmail.com> > > wrote: > > > > > I have an issue with my HBase cluster. We have a 4 node HBase/Hadoop > > (4*32 > > > GB RAM and 4*6 TB disk space) cluster. We are using Cloudera > distribution > > > for maintaining our cluster. I have a single tweets table in which we > > store > > > the tweets, one tweet per row (it has millions of rows currently). > > > > > > Now I try to run a Java batch (not a map reduce) which does the > > following : > > > > > > 1. Open a scanner over the tweet table and read the tweets one after > > > another. I set scanner caching to 128 rows as higher scanner caching > is > > > leading to ScannerTimeoutExceptions. I scan over the first 10k rows > > only. > > > 2. For each tweet, extract URLs (linkcolfamily:urlvalue) that are > there > > > in that tweet and open another scanner over the tweets table to see > who > > > else shared that link. This involves getting rows having that URL > from > > the > > > entire table (not first 10k rows). > > > 3. Do similar stuff as in step 2 for hashtags > > > (hashtagcolfamily:hashtagvalue). > > > 4. Do steps 1-3 in parallel for approximately 7-8 threads. This > number > > > can be higher (thousands also) later. > > > > > > > > > When I run this batch I got the GC issue which is specified here > > > > > > http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/ > > > Then I tried to turn on the MSLAB feature and changed the GC settings > by > > > specifying -XX:+UseParNewGC and -XX:+UseConcMarkSweepGC JVM flags. > > > Even after doing this, I am running into all kinds of IOExceptions > > > and SocketTimeoutExceptions. > > > > > > This Java batch opens approximately 7*2 (14) scanners open at a point > in > > > time and still I am running into all kinds of troubles. I am wondering > > > whether I can have thousands of parallel scanners with HBase when I > need > > to > > > scale. > > > > > > It would be great to know whether I can open thousands/millions of > > scanners > > > in parallel with HBase efficiently. > > > > > > Thanks > > > Narendra > > >