Hi,
> With a record size of 1k, I'd guesstimate that going with more scans > is going to be better than one big scan. This is because a scan that > filters out data still has to read that data from disk, and 1k rows > are pretty big. Would your answer be different if Alex/you knew if that data was actually read from either the OS cache or MemStore? One can tell if disk is doing IO (or not) by using iostat/vmstat, but what about MemStore? Another thought. When you have 1 scan you have one monolithic operation, so to speak. But if you have N scans, you could parallelize them.... somehow. Is this correct? I found https://issues.apache.org/jira/browse/HBASE-1935 which sounds like it was reviewed, got positive feedback, went through 3 patch revisions by stack, but didn't get committed yet. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - HBase Hadoop ecosystem search :: http://search-hadoop.com/ > But nothing will beat hard numbers. Build a test setup and let us know > which approach works! > On Wed, Feb 23, 2011 at 2:40 PM, Alex Baranau <[email protected]> >wrote: > > Hello, > > > > Would be great if somebody can share thoughts/ideas/some numbers on the > > following problem. > > > > We have a reporting app. To fetch data for some chart/report we currently > > use multiple scans, usually 10-50. We fetch about 100 records with each >scan > > which we use to construct a report. > > > > I've revised data we store and code logic and see that we could really fetch > > same data with single scan by specifying filters to filter out data which > > doesn't fit the report params. In this case the scan range will be about > > 100-200K records from which after filtering we'd get the same records as we > > do currently fetch with multiple scans. > > > > So the question is: given these numbers (10-50 scans fetching 100 records > > each VS 1 scan + filters on range of 100-200K records) will the optimization > > I have in mind really improve performance? Unfortunately we don't have good > > volume of data currently to perform tests on. May be someone can share > > thoughts based solely on these numbers? Record size is about 1Kb. > > > > Thank you! > > Alex Baranau > > >
