Looks like that sstablemetadata is available in 2.2 , we are on 2.0.x do you know anything that will work on 2.0.x
On Tue, Feb 23, 2016 at 1:48 PM, Anishek Agarwal <anis...@gmail.com> wrote: > Thanks Jeff, Awesome will look at the tools and JMX endpoint. > > our settings are below originated from the jira you posted above as the > base. we are running on 48 core machines with 2 SSD disks of 800 GB each . > > MAX_HEAP_SIZE="6G" > > HEAP_NEWSIZE="4G" > > JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" > > JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" > > JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" > > JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=6" > > JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4" > > JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70" > > JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" > > JVM_OPTS="$JVM_OPTS -XX:+UseTLAB" > > JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m" > > JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts" > > JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops" > > JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark" > > JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48" > > JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48" > > JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent" > > JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions" > > JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity" > > JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs" > > # earlier value 131072 > > JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32678" > > JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600" > > JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32678" > > JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32678" > > > On Tue, Feb 23, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > >> There exists a JMX endpoint called forceUserDefinedCompaction that takes >> a comma separated list of sstables to compact together. >> >> There also exists a tool called sstablemetadata (may be in a >> ‘cassandra-tools’ package separate from whatever package you used to >> install cassandra, or in the tools/ directory of your binary package). >> Using sstablemetadata, you can look at the maxTimestamp for each table, and >> the ‘Estimated droppable tombstones’. Using those two fields, you could, >> very easily, write a script that gives you a list of sstables that you >> could feed to forceUserDefinedCompaction to join together to eliminate >> leftover waste. >> >> Your long ParNew times may be fixable by increasing the new gen size of >> your heap – the general guidance in cassandra-env.sh is out of date, you >> may want to reference CASSANDRA-8150 for “newer” advice ( >> http://issues.apache.org/jira/browse/CASSANDRA-8150 ) >> >> - Jeff >> >> From: Anishek Agarwal >> Reply-To: "user@cassandra.apache.org" >> Date: Monday, February 22, 2016 at 8:33 PM >> >> To: "user@cassandra.apache.org" >> Subject: Re: High Bloom filter false ratio >> >> Hey Jeff, >> >> Thanks for the clarification, I did not explain my self clearly, the >> max_stable_age_days >> is set to 30 days and the ttl on every insert is set to 30 days also >> by default. gc_grace_seconds is 0, so i would think the sstable as a whole >> would be deleted. >> >> Because of the problems mentioned by at 1) above it looks like, there >> might be cases where the table just lies around since no compaction is >> happening on it and even though everything is expired it would still not be >> deleted? >> >> for 3) the average read is pretty good, though the throughput doesn't >> seem to be that great, when no repair is running we get GCIns > 200ms every >> couple of hours once, otherwise its every 10-20 mins >> >> INFO [ScheduledTasks:1] 2016-02-23 05:15:03,070 GCInspector.java (line >> 116) GC for ParNew: 205 ms for 1 collections, 1712439128 used; max is >> 7784628224 >> >> INFO [ScheduledTasks:1] 2016-02-23 08:30:47,709 GCInspector.java (line >> 116) GC for ParNew: 242 ms for 1 collections, 1819126928 used; max is >> 7784628224 >> >> INFO [ScheduledTasks:1] 2016-02-23 09:09:55,085 GCInspector.java (line >> 116) GC for ParNew: 374 ms for 1 collections, 1829660304 used; max is >> 7784628224 >> >> INFO [ScheduledTasks:1] 2016-02-23 09:11:21,245 GCInspector.java (line >> 116) GC for ParNew: 419 ms for 1 collections, 2309875224 used; max is >> 7784628224 >> >> INFO [ScheduledTasks:1] 2016-02-23 09:35:50,717 GCInspector.java (line >> 116) GC for ParNew: 231 ms for 1 collections, 2515325328 used; max is >> 7784628224 >> >> INFO [ScheduledTasks:1] 2016-02-23 09:38:47,194 GCInspector.java (line >> 116) GC for ParNew: 252 ms for 1 collections, 1724241952 used; max is >> 7784628224 >> >> >> our reading patterns are dependent on BF to work efficiently as we do a >> lot of reads for keys that may not exists because its time series and >> we segregate data based on hourly boundary from epoch. >> >> >> hey Christoper, >> >> yes every row in the stable that should have been deleted has "d" in that >> column. Also the key for one of the row is as >> >> "key": "0008000000000cdd5edd000008000000000006251000" >> >> >> >> how do i get it back to normal readable format to get the (long,long) -- >> composite partition key back? >> >> Looks like i have to force a major compaction to delete a lot of data ? >> are there any other solutions ? >> >> thanks >> anishek >> >> >> >> On Mon, Feb 22, 2016 at 11:21 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >>> 1) getFullyExpiredSSTables in 2.0 isn’t as thorough as many expect, so >>> it’s very likely that some sstables stick around longer than you expect. >>> >>> 2) max_sstable_age_days tells cassandra when to stop compacting that >>> file, not when to delete it. >>> >>> 3) You can change the window size using both the base_time_seconds >>> parameter and max_sstable_age_days parameter (use the former to set the >>> size of the first window, and the latter to determine how long before you >>> stop compacting that window). It’s somewhat non-intuitive. >>> >>> Your read latencies actually look pretty reasonable, are you sure you’re >>> not simply hitting GC pauses that cause your queries to run longer than you >>> expect? Do you have graphs of GC time (first derivative of total gc time is >>> common for tools like graphite), or do you see ‘gcinspector’ in your logs >>> indicating pauses > 200ms? >>> >>> From: Anishek Agarwal >>> Reply-To: "user@cassandra.apache.org" >>> Date: Sunday, February 21, 2016 at 11:13 PM >>> To: "user@cassandra.apache.org" >>> Subject: Re: High Bloom filter false ratio >>> >>> Hey guys, >>> >>> Just did some more digging ... looks like DTCS is not removing old data >>> completely, I used sstable2json for one such table and saw old data there. >>> we have a value of 30 for max_stable_age_days for the table. >>> >>> One of the columns showed data as :["2015-12-10 11\\:03+0530:", >>> "56690ea2", 1449725602552000, "d"] what is the meaning of "d" in the last >>> IS_MARKED_FOR_DELETE column ? >>> >>> I see data from 10 dec 2015 still there, looks like there are a few >>> issues with DTCS, Operationally what choices do i have to rectify this, We >>> are on version 2.0.15. >>> >>> thanks >>> anishek >>> >>> >>> >>> >>> On Mon, Feb 22, 2016 at 10:23 AM, Anishek Agarwal <anis...@gmail.com> >>> wrote: >>> >>>> We are using DTCS have a 30 day window for them before they are cleaned >>>> up. I don't think with DTCS we can do anything about table sizing. Please >>>> do let me know if there are other ideas. >>>> >>>> On Sat, Feb 20, 2016 at 12:51 AM, Jaydeep Chovatia < >>>> chovatia.jayd...@gmail.com> wrote: >>>> >>>>> To me following three looks on higher side: >>>>> SSTable count: 1289 >>>>> >>>>> In order to reduce SSTable count see if you are compacting of not (If >>>>> using STCS). Is it possible to change this to LCS? >>>>> >>>>> >>>>> Number of keys (estimate): 345137664 (345M partition keys) >>>>> >>>>> I don't have any suggestion about reducing this unless you partition >>>>> your data. >>>>> >>>>> >>>>> Bloom filter space used, bytes: 493777336 (400MB is huge) >>>>> >>>>> If number of keys are reduced then this will automatically reduce >>>>> bloom filter size I believe. >>>>> >>>>> >>>>> >>>>> Jaydeep >>>>> >>>>> On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal <anis...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> @Jaydeep here is the cfstats output from one node. >>>>>> >>>>>> Read Count: 1721134722 >>>>>> >>>>>> Read Latency: 0.04268825050756254 ms. >>>>>> >>>>>> Write Count: 56743880 >>>>>> >>>>>> Write Latency: 0.014650376727851532 ms. >>>>>> >>>>>> Pending Tasks: 0 >>>>>> >>>>>> Table: user_stay_points >>>>>> >>>>>> SSTable count: 1289 >>>>>> >>>>>> Space used (live), bytes: 122141272262 >>>>>> >>>>>> Space used (total), bytes: 224227850870 >>>>>> >>>>>> Off heap memory used (total), bytes: 653827528 >>>>>> >>>>>> SSTable Compression Ratio: 0.4959736121441446 >>>>>> >>>>>> Number of keys (estimate): 345137664 >>>>>> >>>>>> Memtable cell count: 339034 >>>>>> >>>>>> Memtable data size, bytes: 106558314 >>>>>> >>>>>> Memtable switch count: 3266 >>>>>> >>>>>> Local read count: 1721134803 >>>>>> >>>>>> Local read latency: 0.048 ms >>>>>> >>>>>> Local write count: 56743898 >>>>>> >>>>>> Local write latency: 0.018 ms >>>>>> >>>>>> Pending tasks: 0 >>>>>> >>>>>> Bloom filter false positives: 40664437 >>>>>> >>>>>> Bloom filter false ratio: 0.69058 >>>>>> >>>>>> Bloom filter space used, bytes: 493777336 >>>>>> >>>>>> Bloom filter off heap memory used, bytes: 493767024 >>>>>> >>>>>> Index summary off heap memory used, bytes: 91677192 >>>>>> >>>>>> Compression metadata off heap memory used, bytes: 68383312 >>>>>> >>>>>> Compacted partition minimum bytes: 104 >>>>>> >>>>>> Compacted partition maximum bytes: 1629722 >>>>>> >>>>>> Compacted partition mean bytes: 1773 >>>>>> >>>>>> Average live cells per slice (last five minutes): 0.0 >>>>>> >>>>>> Average tombstones per slice (last five minutes): 0.0 >>>>>> >>>>>> >>>>>> @Tyler Hobbs >>>>>> >>>>>> we are using cassandra 2.0.15 so >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-8525 shouldnt >>>>>> occur. Other problems looks like will be fixed in 3.0 .. we will mostly >>>>>> try >>>>>> and slot in an upgrade to 3.x version towards second quarter of this >>>>>> year. >>>>>> >>>>>> >>>>>> @Daemon >>>>>> >>>>>> Latencies seem to have higher ratios, attached is the graph. >>>>>> >>>>>> >>>>>> I am mostly trying to look at Bloom filters, because the way we do >>>>>> reads, we read data with non existent partition keys and it seems to be >>>>>> taking long to respond, like for 720 queries it takes 2 seconds, with all >>>>>> 721 queries not returning anything. the 720 queries are done in >>>>>> sequence of 180 queries each with 180 of them running in parallel. >>>>>> >>>>>> >>>>>> thanks >>>>>> >>>>>> anishek >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia < >>>>>> chovatia.jayd...@gmail.com> wrote: >>>>>> >>>>>>> How many partition keys exists for the table which shows this >>>>>>> problem (or provide nodetool cfstats for that table)? >>>>>>> >>>>>>> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle < >>>>>>> daeme...@gmail.com> wrote: >>>>>>> >>>>>>>> The bloom filter buckets the values in a small number of buckets. I >>>>>>>> have been surprised by how many cases I see with large cardinality >>>>>>>> where a >>>>>>>> few values populate a given bloom leaf, resulting in high false >>>>>>>> positives, >>>>>>>> and a surprising impact on latencies! >>>>>>>> >>>>>>>> Are you seeing 2:1 ranges between mean and worse case latencies >>>>>>>> (allowing for gc times)? >>>>>>>> >>>>>>>> Daemeon Reiydelle >>>>>>>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote: >>>>>>>> >>>>>>>>> You can try slightly lowering the bloom_filter_fp_chance on your >>>>>>>>> table. >>>>>>>>> >>>>>>>>> Otherwise, it's possible that you're repeatedly querying one or >>>>>>>>> two partitions that always trigger a bloom filter false positive. You >>>>>>>>> could try manually tracing a few queries on this table (for >>>>>>>>> non-existent >>>>>>>>> partitions) to see if the bloom filter rejects them. >>>>>>>>> >>>>>>>>> Depending on your Cassandra version, your false positive ratio >>>>>>>>> could be inaccurate: >>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-8525 >>>>>>>>> >>>>>>>>> There are also a couple of recent improvements to bloom filters: >>>>>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413 >>>>>>>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal < >>>>>>>>> anis...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> We have a table with composite partition key with humungous >>>>>>>>>> cardinality, its a combination of (long,long). On the table we have >>>>>>>>>> bloom_filter_fp_chance=0.010000. >>>>>>>>>> >>>>>>>>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster >>>>>>>>>> we are seeing "Bloom filter false ratio:" in the range of 0.7 -0.9. >>>>>>>>>> >>>>>>>>>> I thought over time the bloom filter would adjust to the key >>>>>>>>>> space cardinality, we have been running the cluster for a long time >>>>>>>>>> now but >>>>>>>>>> have added significant traffic from Jan this year, which would not >>>>>>>>>> lead to >>>>>>>>>> writes in the db but would lead to high reads to see if are any >>>>>>>>>> values. >>>>>>>>>> >>>>>>>>>> Are there any settings that can be changed to allow better ratio. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Anishek >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Tyler Hobbs >>>>>>>>> DataStax <http://datastax.com/> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >