Re: High BloomFilterFalseRation

2010-11-02 Thread Daniel Doubleday
Hi all had some time yesterday to dig a lil deeper. And maybe this saves someone who made the same mistake the time so ... After trying to reproduce the problem in unit tests with the same data which led nowhere because every single result was almost exactly what the math promised and

Re: High BloomFilterFalseRation

2010-11-02 Thread Ryan King
On Tue, Nov 2, 2010 at 1:28 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Hi all had some time yesterday to dig a lil deeper. And maybe this saves someone who made the same mistake the time so ... After trying to reproduce the problem in unit tests with the same data which led

High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hi people We are currently moving our second use case from mysql to cassandra. While importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to be pretty high compared to another CF which is in used in production right now. Its a hierarchical data model and I cannot avoid

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute no matter what keys come in. Keys are '/' separated strings (aka paths :-)) I do bulk inserts like: (1000

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis
Do you have a key a/b then? What columns does it have? On Wed, Oct 27, 2010 at 9:14 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Ah of course - question makes total sense. But no: this is not the case: I am not constantly asking the same question since the tree is deep enough. Most data nodes are level 5 from the root. So the parents getting queried will be different most of the time. Since the parent nodes are