Hi, In one go I do about 5M reads. In this case running task for longer didn't affect performance. Regarding DISTINCT: I'm doing this through Pig - I've written UDF for getting Rows, and right now I started to wonder if this might affect performance. I have run more test and at any moment CPU usage is not greater than 50% and disks are barely used. >From logs I see, that cache hit reaches about 97%. I have tested latency from hbase shell, and first get takes 0.7120 second, but next ones take 0.04 second, which is as expected, so I think that Pig might be slowing everything down.
2010/11/5 Stack <[email protected]> > On Thu, Nov 4, 2010 at 4:06 AM, Wojciech Langiewicz > <[email protected]> wrote: > > I didn't notice any improvement after changing option > > hfile.block.cache.size, I don't know if this i relevant, but in my > testing > > job I do at most only one Get per row (before querying HBase I do > DISTINCT). > > > > Stats from cache reads are here: http://pastebin.com/BmmL09dK > > This is after restarting servers, and during running first job. > > > > How many reads did you do? I see the cache hit ratio climbing as your > test progresses. Run it for longer? What kinda latency are you > seeing? > > Coming out of cache you should be seeing < 5ms or so? > > How are you accessing HBase (The DISTINCT above makes me wonder). > > Thanks, > St.Ack > -- Wojciech Langiewicz
