Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions
Looks a lot like read repair but impossible to tell for sure -- Jeff Jirsa > On Aug 9, 2017, at 4:34 PM, Sumanth Pasupuleti >wrote: > > My final try on pushing the attachment over. > > > > >> On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti >> wrote: >> Thanks for the insights Jeff! I did go through the tickets around dropping >> expired sstables that have overlaps - based on what I understand, the only >> undesirable impact of that would be possible data resurrection. >> >> I have now attached the output of sstableslicer with the mail. Will submit a >> patch for review. >> >> Thanks, >> Sumanth >> >>> On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa wrote: >>> The most likely cause is read repairs due to consistency level repairs >>> (digest mismatch). The only way to actually eliminate read repair is to >>> read with CL:ONE, which almost nobody does (at least in time series use >>> cases, because it implies you probably write with ALL, or run repair which >>> - as you've noted - often isn't necessary in ttl-only use cases). >>> >>> I can't see the image, but more tools for understanding sstable state are >>> never a bad thing (as long as they're generally useful and maintainable). >>> >>> For what it's worth, there are tickets in flight for being more aggressive >>> at dropping overlaps, but there are companies that use tools that stop the >>> cluster, use sstablemetadata to identify sstables we knew should be fully >>> expired, and manually remove them (/bin/rm) before starting cassandra >>> again. It works reasonably well IF (and only if) you write all data with >>> TTLs, and you can identify fully expired sstables based on maximum >>> timestamps. >>> >>> >>> >>> >>> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti < >>> sumanth.pasupuleti...@gmail.com> wrote: >>> >>> > Hi, >>> >> >>> >> We use TWCS in a few of the column families that have TTL based >>> >> time-series data, and no explicit deletes are issued. Over the time, we >>> >> observed the disk usage has been increasing beyond the expected levels. >>> >> >>> >> Data directory in a particular node shows SSTables that are more than >>> >> 16days old, while the bucket size is configured at 12hours, TTL is at >>> >> 15days and GC grace at 1hour. >>> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking >>> >> and blocked SSTables. SSTableMetadata that is shown in the output >>> >> indicates >>> >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable >>> >> and the blocked SSTables, which is preventing the older SSTables from >>> >> getting dropped/deleted. >>> >> >>> >> Following are the possible root causes we considered >>> >> >>> >>1. Hints - old data hints getting replayed from the coordinator node. >>> >>We ruled this out since hints live for no more than 1 day based on our >>> >>configuration. >>> >>2. External compactions - no external compactions were run, that >>> >>could cause compaction of SSTables across the TWCS buckets. >>> >>3. Read repairs - this is ruled out as well, since we never ran >>> >>external repairs, and read repair chance on the TWCS column families >>> >> has >>> >>been set to 0. >>> >>4. Application team writing data with older timestamp (in newer >>> >>SSTables). >>> >> >>> >> >>> >>1. We wanted to identify the specific row keys with older timestamps >>> >> in the blocking SSTable, that could be causing this issue to >>> >> occur. We >>> >> considered using SSTable2Keys/json, however, since both the tools >>> >> involve >>> >> outputting the entire content/keys of the SSTable in the order of >>> >> the keys, >>> >> they were not helpful in this case. >>> >> 2. Since we wanted to get data on a few oldest cells with >>> >> timestamps, we created a tool mostly based off of sstable2json, >>> >> called >>> >> sstableslicer, to output 'n' top/bottom cells in an SSTable, >>> >> ordered either >>> >> on writetime/localDeletionTime. This helped us identify the >>> >> specific cells >>> >> in new SSTables with older timestamps, which further helped in >>> >> debugging on >>> >> the application end. From application team perspective, however, >>> >> writing >>> >> data with old timestamp is not a possible scenario. >>> >> >>> >>3. Below is a sample output of sstableslicer >>> > [image: Inline image 2] >>> > >>> > >>> >> Looking for suggestions, especially around following two things: >>> >> >>> >>1. Did we miss any other case in TWCS that could be causing such >>> >>overlap? >>> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If >>> >>yes, I shall create a JIRA and submit a PR/patch for review. >>> >> >>> >> C* version we use is 2.1.17. >>> > >>> > Thanks, >>> >> Sumanth >>> >> >>> > >> >
Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions
My final try on pushing the attachment over. On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti < sumanth.pasupuleti...@gmail.com> wrote: > Thanks for the insights Jeff! I did go through the tickets around dropping > expired sstables that have overlaps - based on what I understand, the only > undesirable impact of that would be possible data resurrection. > > I have now attached the output of sstableslicer with the mail. Will submit > a patch for review. > > Thanks, > Sumanth > > On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsawrote: > >> The most likely cause is read repairs due to consistency level repairs >> (digest mismatch). The only way to actually eliminate read repair is to >> read with CL:ONE, which almost nobody does (at least in time series use >> cases, because it implies you probably write with ALL, or run repair which >> - as you've noted - often isn't necessary in ttl-only use cases). >> >> I can't see the image, but more tools for understanding sstable state are >> never a bad thing (as long as they're generally useful and maintainable). >> >> For what it's worth, there are tickets in flight for being more aggressive >> at dropping overlaps, but there are companies that use tools that stop the >> cluster, use sstablemetadata to identify sstables we knew should be fully >> expired, and manually remove them (/bin/rm) before starting cassandra >> again. It works reasonably well IF (and only if) you write all data with >> TTLs, and you can identify fully expired sstables based on maximum >> timestamps. >> >> >> >> >> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti < >> sumanth.pasupuleti...@gmail.com> wrote: >> >> > Hi, >> >> >> >> We use TWCS in a few of the column families that have TTL based >> >> time-series data, and no explicit deletes are issued. Over the time, we >> >> observed the disk usage has been increasing beyond the expected levels. >> >> >> >> Data directory in a particular node shows SSTables that are more than >> >> 16days old, while the bucket size is configured at 12hours, TTL is at >> >> 15days and GC grace at 1hour. >> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking >> >> and blocked SSTables. SSTableMetadata that is shown in the output >> indicates >> >> there is an overlap in the MinTS-MaxTS period among the blocking >> SSTable >> >> and the blocked SSTables, which is preventing the older SSTables from >> >> getting dropped/deleted. >> >> >> >> Following are the possible root causes we considered >> >> >> >>1. Hints - old data hints getting replayed from the coordinator >> node. >> >>We ruled this out since hints live for no more than 1 day based on >> our >> >>configuration. >> >>2. External compactions - no external compactions were run, that >> >>could cause compaction of SSTables across the TWCS buckets. >> >>3. Read repairs - this is ruled out as well, since we never ran >> >>external repairs, and read repair chance on the TWCS column >> families has >> >>been set to 0. >> >>4. Application team writing data with older timestamp (in newer >> >>SSTables). >> >> >> >> >> >>1. We wanted to identify the specific row keys with older timestamps >> >> in the blocking SSTable, that could be causing this issue to >> occur. We >> >> considered using SSTable2Keys/json, however, since both the >> tools involve >> >> outputting the entire content/keys of the SSTable in the order >> of the keys, >> >> they were not helpful in this case. >> >> 2. Since we wanted to get data on a few oldest cells with >> >> timestamps, we created a tool mostly based off of sstable2json, >> called >> >> sstableslicer, to output 'n' top/bottom cells in an SSTable, >> ordered either >> >> on writetime/localDeletionTime. This helped us identify the >> specific cells >> >> in new SSTables with older timestamps, which further helped in >> debugging on >> >> the application end. From application team perspective, however, >> writing >> >> data with old timestamp is not a possible scenario. >> >> >> >>3. Below is a sample output of sstableslicer >> > [image: Inline image 2] >> > >> > >> >> Looking for suggestions, especially around following two things: >> >> >> >>1. Did we miss any other case in TWCS that could be causing such >> >>overlap? >> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If >> >>yes, I shall create a JIRA and submit a PR/patch for review. >> >> >> >> C* version we use is 2.1.17. >> > >> > Thanks, >> >> Sumanth >> >> >> > >> > >
Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions
Thanks for the insights Jeff! I did go through the tickets around dropping expired sstables that have overlaps - based on what I understand, the only undesirable impact of that would be possible data resurrection. I have now attached the output of sstableslicer with the mail. Will submit a patch for review. Thanks, Sumanth On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsawrote: > The most likely cause is read repairs due to consistency level repairs > (digest mismatch). The only way to actually eliminate read repair is to > read with CL:ONE, which almost nobody does (at least in time series use > cases, because it implies you probably write with ALL, or run repair which > - as you've noted - often isn't necessary in ttl-only use cases). > > I can't see the image, but more tools for understanding sstable state are > never a bad thing (as long as they're generally useful and maintainable). > > For what it's worth, there are tickets in flight for being more aggressive > at dropping overlaps, but there are companies that use tools that stop the > cluster, use sstablemetadata to identify sstables we knew should be fully > expired, and manually remove them (/bin/rm) before starting cassandra > again. It works reasonably well IF (and only if) you write all data with > TTLs, and you can identify fully expired sstables based on maximum > timestamps. > > > > > On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti < > sumanth.pasupuleti...@gmail.com> wrote: > > > Hi, > >> > >> We use TWCS in a few of the column families that have TTL based > >> time-series data, and no explicit deletes are issued. Over the time, we > >> observed the disk usage has been increasing beyond the expected levels. > >> > >> Data directory in a particular node shows SSTables that are more than > >> 16days old, while the bucket size is configured at 12hours, TTL is at > >> 15days and GC grace at 1hour. > >> Upon using sstableexpiredblockers, we got quite a few sets of blocking > >> and blocked SSTables. SSTableMetadata that is shown in the output > indicates > >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable > >> and the blocked SSTables, which is preventing the older SSTables from > >> getting dropped/deleted. > >> > >> Following are the possible root causes we considered > >> > >>1. Hints - old data hints getting replayed from the coordinator node. > >>We ruled this out since hints live for no more than 1 day based on > our > >>configuration. > >>2. External compactions - no external compactions were run, that > >>could cause compaction of SSTables across the TWCS buckets. > >>3. Read repairs - this is ruled out as well, since we never ran > >>external repairs, and read repair chance on the TWCS column families > has > >>been set to 0. > >>4. Application team writing data with older timestamp (in newer > >>SSTables). > >> > >> > >>1. We wanted to identify the specific row keys with older timestamps > >> in the blocking SSTable, that could be causing this issue to > occur. We > >> considered using SSTable2Keys/json, however, since both the tools > involve > >> outputting the entire content/keys of the SSTable in the order of > the keys, > >> they were not helpful in this case. > >> 2. Since we wanted to get data on a few oldest cells with > >> timestamps, we created a tool mostly based off of sstable2json, > called > >> sstableslicer, to output 'n' top/bottom cells in an SSTable, > ordered either > >> on writetime/localDeletionTime. This helped us identify the > specific cells > >> in new SSTables with older timestamps, which further helped in > debugging on > >> the application end. From application team perspective, however, > writing > >> data with old timestamp is not a possible scenario. > >> > >>3. Below is a sample output of sstableslicer > > [image: Inline image 2] > > > > > >> Looking for suggestions, especially around following two things: > >> > >>1. Did we miss any other case in TWCS that could be causing such > >>overlap? > >>2. Does sstableslicer seem valuable, to be included in Apache C*? If > >>yes, I shall create a JIRA and submit a PR/patch for review. > >> > >> C* version we use is 2.1.17. > > > > Thanks, > >> Sumanth > >> > > > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org