Re: High Bloom filter false ratio

Jaydeep Chovatia Fri, 19 Feb 2016 11:22:01 -0800

To me following three looks on higher side:
SSTable count: 1289

In order to reduce SSTable count see if you are compacting of not (If using
STCS). Is it possible to change this to LCS?



Number of keys (estimate): 345137664 (345M partition keys)

I don't have any suggestion about reducing this unless you partition your
data.


Bloom filter space used, bytes: 493777336 (400MB is huge)

If number of keys are reduced then this will automatically reduce bloom
filter size I believe.



Jaydeep

On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com> wrote:

> Hey all,
>
> @Jaydeep here is the cfstats output from one node.
>
> Read Count: 1721134722
>
> Read Latency: 0.04268825050756254 ms.
>
> Write Count: 56743880
>
> Write Latency: 0.014650376727851532 ms.
>
> Pending Tasks: 0
>
> Table: user_stay_points
>
> SSTable count: 1289
>
> Space used (live), bytes: 122141272262
>
> Space used (total), bytes: 224227850870
>
> Off heap memory used (total), bytes: 653827528
>
> SSTable Compression Ratio: 0.4959736121441446
>
> Number of keys (estimate): 345137664
>
> Memtable cell count: 339034
>
> Memtable data size, bytes: 106558314
>
> Memtable switch count: 3266
>
> Local read count: 1721134803
>
> Local read latency: 0.048 ms
>
> Local write count: 56743898
>
> Local write latency: 0.018 ms
>
> Pending tasks: 0
>
> Bloom filter false positives: 40664437
>
> Bloom filter false ratio: 0.69058
>
> Bloom filter space used, bytes: 493777336
>
> Bloom filter off heap memory used, bytes: 493767024
>
> Index summary off heap memory used, bytes: 91677192
>
> Compression metadata off heap memory used, bytes: 68383312
>
> Compacted partition minimum bytes: 104
>
> Compacted partition maximum bytes: 1629722
>
> Compacted partition mean bytes: 1773
>
> Average live cells per slice (last five minutes): 0.0
>
> Average tombstones per slice (last five minutes): 0.0
>
>
> @Tyler Hobbs
>
> we are using cassandra 2.0.15 so
> https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur.
> Other problems looks like will be fixed in 3.0 .. we will mostly try and
> slot in an upgrade to 3.x version towards second quarter of this year.
>
>
> @Daemon
>
> Latencies seem to have higher ratios, attached is the graph.
>
>
> I am mostly trying to look at Bloom filters, because the way we do reads,
> we read data with non existent partition keys and it seems to be taking
> long to respond, like for 720 queries it takes 2 seconds, with all 721
> queries not returning anything. the 720 queries are done in sequence of
> 180 queries each with 180 of them running in parallel.
>
>
> thanks
>
> anishek
>
>
>
> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> How many partition keys exists for the table which shows this problem (or
>> provide nodetool cfstats for that table)?
>>
>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> The bloom filter buckets the values in a small number of buckets. I have
>>> been surprised by how many cases I see with large cardinality where a few
>>> values populate a given bloom leaf, resulting in high false positives, and
>>> a surprising impact on latencies!
>>>
>>> Are you seeing 2:1 ranges between mean and worse case latencies
>>> (allowing for gc times)?
>>>
>>> Daemeon Reiydelle
>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
>>>
>>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>>
>>>> Otherwise, it's possible that you're repeatedly querying one or two
>>>> partitions that always trigger a bloom filter false positive.  You could
>>>> try manually tracing a few queries on this table (for non-existent
>>>> partitions) to see if the bloom filter rejects them.
>>>>
>>>> Depending on your Cassandra version, your false positive ratio could be
>>>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>>>
>>>> There are also a couple of recent improvements to bloom filters:
>>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>>>
>>>>
>>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We have a table with composite partition key with humungous
>>>>> cardinality, its a combination of (long,long). On the table we have
>>>>> bloom_filter_fp_chance=0.010000.
>>>>>
>>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we
>>>>> are seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>>>
>>>>> I thought over time the bloom filter would adjust to the key space
>>>>> cardinality, we have been running the cluster for a long time now but have
>>>>> added significant traffic from Jan this year, which would not lead to
>>>>> writes in the db but would lead to high reads to see if are any values.
>>>>>
>>>>> Are there any settings that can be changed to allow better ratio.
>>>>>
>>>>> Thanks
>>>>> Anishek
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> DataStax <http://datastax.com/>
>>>>
>>>
>>
>

Re: High Bloom filter false ratio

Reply via email to