On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote: > Are you doing RF=1?
That is correct. So are you calculations then :-) > > very small, <1k. Data from this cf is only read via hadoop jobs in batch > > reads of 16k rows at a time. > [snip] > > It's my understanding then for this use case that bloom filters are of > > little importance and that i can > > Depends. I'm not familiar enough with how the hadoop integration works > so someone else will have to comment, but if your hadoop jobs are just > performan normal reads of keys via thrift and the keys they are > grabbing are not in token order, those reads would be effectively > random and bloom filters should still be highly relevant to the amount > of I/O operations you need to perform. They are thrift get_range_slice reads of 16k rows per request. Hadoop reads are based on tokens, but in my use case the keys are also ordered and this cluster is using BOP. ~mck -- "Living on Earth is expensive, but it does include a free trip around the sun every year." Unknown | http://github.com/finn-no | http://tech.finn.no |
signature.asc
Description: This is a digitally signed message part