> An average row size is ~200 Bytes.

How many columns do you have?

I assume every time you try to fetch "non-cached in RSs block cache" data
(i.e. making "true test"), right?

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Mon, Sep 17, 2012 at 12:36 AM, Anoop Sam John <[email protected]> wrote:

> >The reason the scan with setBatch(1) is much
> much faster is because it returns the only the value for the first column ?
>
> When u set batching=1, it returns all the column values of rows. But one
> column value at a time.... FYI
>
> -Anoop-
> ________________________________________
> From: Amit Sela [[email protected]]
> Sent: Saturday, September 15, 2012 2:41 PM
> To: [email protected]
> Subject: Re: Optimizing table scans
>
> So just to get it straight. The reason the scan with setBatch(1) is much
> much faster is because it returns the only the value for the first column ?
>
> On Wed, Sep 12, 2012 at 5:37 PM, Doug Meil <[email protected]
> >wrote:
>
> >
> > Hi there,
> >
> > See this for info on the block cache in the RegionServer..
> >
> > http://hbase.apache.org/book.html
> > 9.6.4. Block Cache
> >
> > Š and see this for "batching" on the scan parameter...
> >
> > http://hbase.apache.org/book.html#perf.reading
> > 11.8.1. Scan Caching
> >
> >
> >
> >
> >
> >
> > On 9/12/12 9:55 AM, "Amit Sela" <[email protected]> wrote:
> >
> > >I allocate 10GB per RegionServer.
> > >An average row size is ~200 Bytes.
> > >The network is 1GB.
> > >
> > >It would be great if anyone could elaborate on the difference between
> > >Cache
> > >and Batch parameters.
> > >
> > >Thanks.
> > >
> > >On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel
> > ><[email protected]>wrote:
> > >
> > >> How much memory do you have?
> > >> What's the size of the underlying row?
> > >> What does your network look like? 1GBe or 10GBe?
> > >>
> > >> There's more to it, and I think that you'll find that YMMV on what is
> an
> > >> optimum scan size...
> > >>
> > >> HTH
> > >>
> > >> -Mike
> > >>
> > >> On Sep 12, 2012, at 7:57 AM, Amit Sela <[email protected]> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I'm trying to find the sweet spot for the cache size and batch size
> > >> Scan()
> > >> > parameters.
> > >> >
> > >> > I'm scanning one table using HTable.getScanner() and iterating over
> > >>the
> > >> > ResultScanner retrieved.
> > >> >
> > >> > I did some testing and got the following results:
> > >> >
> > >> > For scanning *1000000* rows.
> > >> >
> > >> > *
> > >> >
> > >> > Cache
> > >> >
> > >> > Batch
> > >> >
> > >> > Total execution time (sec)
> > >> >
> > >> > 10000
> > >> >
> > >> > -1 (default)
> > >> >
> > >> > 112
> > >> >
> > >> > 10000
> > >> >
> > >> > 5000
> > >> >
> > >> > 110
> > >> >
> > >> > 10000
> > >> >
> > >> > 10000
> > >> >
> > >> > 110
> > >> >
> > >> > 10000
> > >> >
> > >> > 20000
> > >> >
> > >> > 110
> > >> >
> > >> > Cache
> > >> >
> > >> > Batch
> > >> >
> > >> > Total execution time (sec)
> > >> >
> > >> > 1000
> > >> >
> > >> > -1 (default)
> > >> >
> > >> > 116
> > >> >
> > >> > 10000
> > >> >
> > >> > -1 (default)
> > >> >
> > >> > 110
> > >> >
> > >> > 20000
> > >> >
> > >> > -1 (default)
> > >> >
> > >> > 115
> > >> >
> > >> > Cache
> > >> >
> > >> > Batch
> > >> >
> > >> > Total execution time (sec)
> > >> >
> > >> > 5000
> > >> >
> > >> > 10
> > >> >
> > >> > 26
> > >> >
> > >> > 20000
> > >> >
> > >> > 10
> > >> >
> > >> > 25
> > >> >
> > >> > 50000
> > >> >
> > >> > 10
> > >> >
> > >> > 26
> > >> >
> > >> > 5000
> > >> >
> > >> > 5
> > >> >
> > >> > 15
> > >> >
> > >> > 20000
> > >> >
> > >> > 5
> > >> >
> > >> > 14
> > >> >
> > >> > 50000
> > >> >
> > >> > 5
> > >> >
> > >> > 14
> > >> >
> > >> > 1000
> > >> >
> > >> > 1
> > >> >
> > >> > 6
> > >> >
> > >> > 5000
> > >> >
> > >> > 1
> > >> >
> > >> > 5
> > >> >
> > >> > 10000
> > >> >
> > >> > 1
> > >> >
> > >> > 4
> > >> >
> > >> > 20000
> > >> >
> > >> > 1
> > >> >
> > >> > 4
> > >> >
> > >> > 50000
> > >> >
> > >> > 1
> > >> >
> > >> > 4
> > >> >
> > >> > *
> > >> > *I don't understand why a lower batch size gives such an improvement
> > >>?*
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Amit.
> > >> > *
> > >> > *
> > >>
> > >>
> >
> >
> >
>

Reply via email to