I guess assuming you have 10M possible partitions, if you're using a relatively uniform hash to generate your IDs, you'll average about 2 per partition. Do you have any index for term/value to partition? This will help you narrow down your search space to a subset of your partitions.
On Fri, Nov 9, 2012 at 11:39 AM, William Slacum < [email protected]> wrote: > That shouldn't be a huge issue. How many rows/partitions do you have? How > many do you have to scan to find the specific column family/doc id you want? > > > On Fri, Nov 9, 2012 at 11:26 AM, Anthony Fox <[email protected]> wrote: > >> I have a table set up to use the intersecting iterator pattern. The >> table has about 20M records which leads to 20M column families for the >> data section - 1 unique column family per record. The index section of >> the table is not quite as large as the data section. The rowkey is a >> random padded integer partition between 0000000 and 9999999. I turned >> bloom filters on and used the ColumnFamilyFunctor to get performant >> column family scans without specifying a range like in the bloom filter >> examples in the README. However, my column family scans (without any >> custom iterator) are still fairly slow - ~30 seconds for a column family >> batch scan of one record. I've also tried RowFunctor but I see similar >> performance. Can anyone shed any light on the performance metrics I'm >> seeing? >> >> Thanks, >> Anthony >> >> >
