https://github.com/medined/D4M_Schema/blob/master/schema/src/main/java/com/codebits/d4m/ingest/MutationFactory.java shows how to use a HyperLogLog object to track cardinality during ingest.
On Fri, Jun 27, 2014 at 11:05 AM, Jamie Stephens <[email protected]> wrote: > Eric, > > Thanks. Yeah, it's pretty easy to sample during ingest. That's probably > what I'll do. In the past, I've also done the traditional batch statistics > generation. Would be easy here with MapReduce+combiner. > > --Jamie > > > > On Fri, Jun 27, 2014 at 9:40 AM, Eric Newton <[email protected]> wrote: >> >> Short answer: no. >> >> Long answer: >> >> You can scan the metadata table for the count/size of the files. >> >> You can query tablet servers for the basic stats of every tablet for a >> given table. This is used for balancing. >> >> But really you should collect the statistics you want during ingest and >> insert them in another table. >> >> -Eric >> >> >> On Fri, Jun 27, 2014 at 9:42 AM, Jamie Stephens <[email protected]> wrote: >>> >>> Is there a way to get a quick estimate of the number of keys in a given >>> range? >>> >>> Perhaps more generally, getting an estimate of the amount of work (and >>> even some sort of confidence based on, say, the age of something) to iterate >>> over a range. >>> >>> I'd like to do some query planning, so statistics like these sure would >>> be nice. >>> >>> --Jamie >>> >> >
