> An average row size is ~200 Bytes. How many columns do you have?
I assume every time you try to fetch "non-cached in RSs block cache" data (i.e. making "true test"), right? Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Mon, Sep 17, 2012 at 12:36 AM, Anoop Sam John <[email protected]> wrote: > >The reason the scan with setBatch(1) is much > much faster is because it returns the only the value for the first column ? > > When u set batching=1, it returns all the column values of rows. But one > column value at a time.... FYI > > -Anoop- > ________________________________________ > From: Amit Sela [[email protected]] > Sent: Saturday, September 15, 2012 2:41 PM > To: [email protected] > Subject: Re: Optimizing table scans > > So just to get it straight. The reason the scan with setBatch(1) is much > much faster is because it returns the only the value for the first column ? > > On Wed, Sep 12, 2012 at 5:37 PM, Doug Meil <[email protected] > >wrote: > > > > > Hi there, > > > > See this for info on the block cache in the RegionServer.. > > > > http://hbase.apache.org/book.html > > 9.6.4. Block Cache > > > > Š and see this for "batching" on the scan parameter... > > > > http://hbase.apache.org/book.html#perf.reading > > 11.8.1. Scan Caching > > > > > > > > > > > > > > On 9/12/12 9:55 AM, "Amit Sela" <[email protected]> wrote: > > > > >I allocate 10GB per RegionServer. > > >An average row size is ~200 Bytes. > > >The network is 1GB. > > > > > >It would be great if anyone could elaborate on the difference between > > >Cache > > >and Batch parameters. > > > > > >Thanks. > > > > > >On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel > > ><[email protected]>wrote: > > > > > >> How much memory do you have? > > >> What's the size of the underlying row? > > >> What does your network look like? 1GBe or 10GBe? > > >> > > >> There's more to it, and I think that you'll find that YMMV on what is > an > > >> optimum scan size... > > >> > > >> HTH > > >> > > >> -Mike > > >> > > >> On Sep 12, 2012, at 7:57 AM, Amit Sela <[email protected]> wrote: > > >> > > >> > Hi all, > > >> > > > >> > I'm trying to find the sweet spot for the cache size and batch size > > >> Scan() > > >> > parameters. > > >> > > > >> > I'm scanning one table using HTable.getScanner() and iterating over > > >>the > > >> > ResultScanner retrieved. > > >> > > > >> > I did some testing and got the following results: > > >> > > > >> > For scanning *1000000* rows. > > >> > > > >> > * > > >> > > > >> > Cache > > >> > > > >> > Batch > > >> > > > >> > Total execution time (sec) > > >> > > > >> > 10000 > > >> > > > >> > -1 (default) > > >> > > > >> > 112 > > >> > > > >> > 10000 > > >> > > > >> > 5000 > > >> > > > >> > 110 > > >> > > > >> > 10000 > > >> > > > >> > 10000 > > >> > > > >> > 110 > > >> > > > >> > 10000 > > >> > > > >> > 20000 > > >> > > > >> > 110 > > >> > > > >> > Cache > > >> > > > >> > Batch > > >> > > > >> > Total execution time (sec) > > >> > > > >> > 1000 > > >> > > > >> > -1 (default) > > >> > > > >> > 116 > > >> > > > >> > 10000 > > >> > > > >> > -1 (default) > > >> > > > >> > 110 > > >> > > > >> > 20000 > > >> > > > >> > -1 (default) > > >> > > > >> > 115 > > >> > > > >> > Cache > > >> > > > >> > Batch > > >> > > > >> > Total execution time (sec) > > >> > > > >> > 5000 > > >> > > > >> > 10 > > >> > > > >> > 26 > > >> > > > >> > 20000 > > >> > > > >> > 10 > > >> > > > >> > 25 > > >> > > > >> > 50000 > > >> > > > >> > 10 > > >> > > > >> > 26 > > >> > > > >> > 5000 > > >> > > > >> > 5 > > >> > > > >> > 15 > > >> > > > >> > 20000 > > >> > > > >> > 5 > > >> > > > >> > 14 > > >> > > > >> > 50000 > > >> > > > >> > 5 > > >> > > > >> > 14 > > >> > > > >> > 1000 > > >> > > > >> > 1 > > >> > > > >> > 6 > > >> > > > >> > 5000 > > >> > > > >> > 1 > > >> > > > >> > 5 > > >> > > > >> > 10000 > > >> > > > >> > 1 > > >> > > > >> > 4 > > >> > > > >> > 20000 > > >> > > > >> > 1 > > >> > > > >> > 4 > > >> > > > >> > 50000 > > >> > > > >> > 1 > > >> > > > >> > 4 > > >> > > > >> > * > > >> > *I don't understand why a lower batch size gives such an improvement > > >>?* > > >> > > > >> > Thanks, > > >> > > > >> > Amit. > > >> > * > > >> > * > > >> > > >> > > > > > > >
