Hey guys, Just did some more digging ... looks like DTCS is not removing old data completely, I used sstable2json for one such table and saw old data there. we have a value of 30 for max_stable_age_days for the table.
One of the columns showed data as :["2015-12-10 11\\:03+0530:", "56690ea2", 1449725602552000, "d"] what is the meaning of "d" in the last IS_MARKED_FOR_DELETE column ? I see data from 10 dec 2015 still there, looks like there are a few issues with DTCS, Operationally what choices do i have to rectify this, We are on version 2.0.15. thanks anishek On Mon, Feb 22, 2016 at 10:23 AM, Anishek Agarwal <anis...@gmail.com> wrote: > We are using DTCS have a 30 day window for them before they are cleaned > up. I don't think with DTCS we can do anything about table sizing. Please > do let me know if there are other ideas. > > On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia < > chovatia.jayd...@gmail.com> wrote: > >> To me following three looks on higher side: >> SSTable count: 1289 >> >> In order to reduce SSTable count see if you are compacting of not (If >> using STCS). Is it possible to change this to LCS? >> >> >> Number of keys (estimate): 345137664 (345M partition keys) >> >> I don't have any suggestion about reducing this unless you partition your >> data. >> >> >> Bloom filter space used, bytes: 493777336 (400MB is huge) >> >> If number of keys are reduced then this will automatically reduce bloom >> filter size I believe. >> >> >> >> Jaydeep >> >> On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com> >> wrote: >> >>> Hey all, >>> >>> @Jaydeep here is the cfstats output from one node. >>> >>> Read Count: 1721134722 >>> >>> Read Latency: 0.04268825050756254 ms. >>> >>> Write Count: 56743880 >>> >>> Write Latency: 0.014650376727851532 ms. >>> >>> Pending Tasks: 0 >>> >>> Table: user_stay_points >>> >>> SSTable count: 1289 >>> >>> Space used (live), bytes: 122141272262 >>> >>> Space used (total), bytes: 224227850870 >>> >>> Off heap memory used (total), bytes: 653827528 >>> >>> SSTable Compression Ratio: 0.4959736121441446 >>> >>> Number of keys (estimate): 345137664 >>> >>> Memtable cell count: 339034 >>> >>> Memtable data size, bytes: 106558314 >>> >>> Memtable switch count: 3266 >>> >>> Local read count: 1721134803 >>> >>> Local read latency: 0.048 ms >>> >>> Local write count: 56743898 >>> >>> Local write latency: 0.018 ms >>> >>> Pending tasks: 0 >>> >>> Bloom filter false positives: 40664437 >>> >>> Bloom filter false ratio: 0.69058 >>> >>> Bloom filter space used, bytes: 493777336 >>> >>> Bloom filter off heap memory used, bytes: 493767024 >>> >>> Index summary off heap memory used, bytes: 91677192 >>> >>> Compression metadata off heap memory used, bytes: 68383312 >>> >>> Compacted partition minimum bytes: 104 >>> >>> Compacted partition maximum bytes: 1629722 >>> >>> Compacted partition mean bytes: 1773 >>> >>> Average live cells per slice (last five minutes): 0.0 >>> >>> Average tombstones per slice (last five minutes): 0.0 >>> >>> >>> @Tyler Hobbs >>> >>> we are using cassandra 2.0.15 so >>> https://issues.apache.org/jira/browse/CASSANDRA-8525 shouldnt occur. >>> Other problems looks like will be fixed in 3.0 .. we will mostly try and >>> slot in an upgrade to 3.x version towards second quarter of this year. >>> >>> >>> @Daemon >>> >>> Latencies seem to have higher ratios, attached is the graph. >>> >>> >>> I am mostly trying to look at Bloom filters, because the way we do >>> reads, we read data with non existent partition keys and it seems to be >>> taking long to respond, like for 720 queries it takes 2 seconds, with all >>> 721 queries not returning anything. the 720 queries are done in >>> sequence of 180 queries each with 180 of them running in parallel. >>> >>> >>> thanks >>> >>> anishek >>> >>> >>> >>> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia < >>> chovatia.jayd...@gmail.com> wrote: >>> >>>> How many partition keys exists for the table which shows this problem >>>> (or provide nodetool cfstats for that table)? >>>> >>>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle <daeme...@gmail.com >>>> > wrote: >>>> >>>>> The bloom filter buckets the values in a small number of buckets. I >>>>> have been surprised by how many cases I see with large cardinality where a >>>>> few values populate a given bloom leaf, resulting in high false positives, >>>>> and a surprising impact on latencies! >>>>> >>>>> Are you seeing 2:1 ranges between mean and worse case latencies >>>>> (allowing for gc times)? >>>>> >>>>> Daemeon Reiydelle >>>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote: >>>>> >>>>>> You can try slightly lowering the bloom_filter_fp_chance on your >>>>>> table. >>>>>> >>>>>> Otherwise, it's possible that you're repeatedly querying one or two >>>>>> partitions that always trigger a bloom filter false positive. You could >>>>>> try manually tracing a few queries on this table (for non-existent >>>>>> partitions) to see if the bloom filter rejects them. >>>>>> >>>>>> Depending on your Cassandra version, your false positive ratio could >>>>>> be inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525 >>>>>> >>>>>> There are also a couple of recent improvements to bloom filters: >>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413 >>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167 >>>>>> >>>>>> >>>>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We have a table with composite partition key with humungous >>>>>>> cardinality, its a combination of (long,long). On the table we have >>>>>>> bloom_filter_fp_chance=0.010000. >>>>>>> >>>>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we >>>>>>> are seeing "Bloom filter false ratio:" in the range of 0.7 -0.9. >>>>>>> >>>>>>> I thought over time the bloom filter would adjust to the key space >>>>>>> cardinality, we have been running the cluster for a long time now but >>>>>>> have >>>>>>> added significant traffic from Jan this year, which would not lead to >>>>>>> writes in the db but would lead to high reads to see if are any values. >>>>>>> >>>>>>> Are there any settings that can be changed to allow better ratio. >>>>>>> >>>>>>> Thanks >>>>>>> Anishek >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Tyler Hobbs >>>>>> DataStax <http://datastax.com/> >>>>>> >>>>> >>>> >>> >> >