Hmmmm, I thought bloomfilters only help on missing rows.  Any time we look up a 
row, we know it is there in our case as it would not be in the other table.  I 
would say statistically 99.9% of the time the row is there and we are okay with 
0.1% of the time wasting hitting the disk.

Do I have this correct though?  Bloomfilters really only help me if the data is 
not there so I don't have to go to the disk and find that out.

Thanks,
Dean

From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Sunday, February 24, 2013 7:09 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: disabling bloomfilter not working? or did I do this wrong?

Yeah, disabling completely is probably not great.
There is some wriggle room between disabled and "less memory"

Did I link to this bloom filter calculator ? http://hur.st/bloomfilter also 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/BloomCalculations.java

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 12:10 PM, Bryan Talbot 
<btal...@aeriagames.com<mailto:btal...@aeriagames.com>> wrote:

I see from your read and write count that your nreldata CF has nearly equal 
number of reads as writes.  I would expect that disabling your bloom filter is 
going to hurt your read performance quite a bit.

Also, beware that disabling your bloom filter may also cause tombstoned rows to 
never be deleted, so if you delete all columns explicitly or use TTL, your data 
may grow more than your expect.  
https://issues.apache.org/jira/browse/CASSANDRA-5182

-Bryan




On Fri, Feb 22, 2013 at 11:59 AM, Hiller, Dean 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
Thanks, but I found out it is still running.  It looks like I have about a 5 
hour wait left for my upgradesstables(waited 4 hours already).  I will check 
the bloomfilter after that.

Out of curiosity, if I had much wider rows (ie. < 900k) per row, will 
compaction run faster(errrr…upgradesstables) at all or would it basically run 
at the same speed.

I guess what I am wondering is 9 hours a normal compaction time for 130gb of 
data?

Thanks,
Dean

From: aaron morton 
<aa...@thelastpickle.com<mailto:aa...@thelastpickle.com><mailto:aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>>
Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Friday, February 22, 2013 10:29 AM
To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Re: disabling bloomfilter not working? or did I do this wrong?

Bloom Filter Space Used: 2318392048<tel:2318392048>
Just to be sane do a quick check of the -Filter.db files on disk for this CF.
If they are very small try a restart on the node.

Number of Keys (estimate): 1249133696
Hey a billion rows on a node, what an age we live in :)

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 23/02/2013, at 4:35 AM, "Hiller, Dean" 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov><mailto:dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>>
 wrote:

So in the cli, I ran

update column family nreldata with bloom_filter_fp_chance=1.0;

Then I ran

nodetool upgradesstables databus5 nreldata;

But my bloom filter size is still around 2gig(and I want to free up this 
heap)!!!! According to nodetool cfstats command…

Column Family: nreldata
SSTable count: 10
Space used (live): 96841497731
Space used (total): 96841497731
Number of Keys (estimate): 1249133696
Memtable Columns Count: 7066
Memtable Data Size: 4286174
Memtable Switch Count: 924
Read Count: 19087150
Read Latency: 0.595 ms.
Write Count: 21281994
Write Latency: 0.013 ms.
Pending Tasks: 0
Bloom Filter False Postives: 974393
Bloom Filter False Ratio: 0.99998
Bloom Filter Space Used: 2318392048
Compacted row minimum size: 73
Compacted row maximum size: 446
Compacted row mean size: 143






Reply via email to