Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Jeff Jirsa
Looks a lot like read repair but impossible to tell for sure


-- 
Jeff Jirsa


> On Aug 9, 2017, at 4:34 PM, Sumanth Pasupuleti 
>  wrote:
> 
> My final try on pushing the attachment over.
> 
> 
> ​
> 
>> On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti 
>>  wrote:
>> Thanks for the insights Jeff! I did go through the tickets around dropping 
>> expired sstables that have overlaps - based on what I understand, the only 
>> undesirable impact of that would be possible data resurrection.
>> 
>> I have now attached the output of sstableslicer with the mail. Will submit a 
>> patch for review.
>> 
>> Thanks,
>> Sumanth
>> 
>>> On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:
>>> The most likely cause is read repairs due to consistency level repairs
>>> (digest mismatch). The only way to actually eliminate read repair is to
>>> read with CL:ONE, which almost nobody does (at least in time series use
>>> cases, because it implies you probably write with ALL, or run repair which
>>> - as you've noted - often isn't necessary in ttl-only use cases).
>>> 
>>> I can't see the image, but more tools for understanding sstable state are
>>> never a bad thing (as long as they're generally useful and maintainable).
>>> 
>>> For what it's worth, there are tickets in flight for being more aggressive
>>> at dropping overlaps, but there are companies that use tools that stop the
>>> cluster, use sstablemetadata to identify sstables we knew should be fully
>>> expired, and manually remove them (/bin/rm) before starting cassandra
>>> again. It works reasonably well IF (and only if) you write all data with
>>> TTLs, and you can identify fully expired sstables based on maximum
>>> timestamps.
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
>>> sumanth.pasupuleti...@gmail.com> wrote:
>>> 
>>> > Hi,
>>> >>
>>> >> We use TWCS in a few of the column families that have TTL based
>>> >> time-series data, and no explicit deletes are issued. Over the time, we
>>> >> observed the disk usage has been increasing beyond the expected levels.
>>> >>
>>> >> Data directory in a particular node shows SSTables that are more than
>>> >> 16days old, while the bucket size is configured at 12hours, TTL is at
>>> >> 15days and GC grace at 1hour.
>>> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>>> >> and blocked SSTables. SSTableMetadata that is shown in the output 
>>> >> indicates
>>> >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
>>> >> and the blocked SSTables, which is preventing the older SSTables from
>>> >> getting dropped/deleted.
>>> >>
>>> >> Following are the possible root causes we considered
>>> >>
>>> >>1. Hints - old data hints getting replayed from the coordinator node.
>>> >>We ruled this out since hints live for no more than 1 day based on our
>>> >>configuration.
>>> >>2. External compactions - no external compactions were run, that
>>> >>could cause compaction of SSTables across the TWCS buckets.
>>> >>3.  Read repairs - this is ruled out as well, since we never ran
>>> >>external repairs, and read repair chance on the TWCS column families 
>>> >> has
>>> >>been set to 0.
>>> >>4.  Application team writing data with older timestamp (in newer
>>> >>SSTables).
>>> >>
>>> >>
>>> >>1. We wanted to identify the specific row keys with older timestamps
>>> >>   in the blocking SSTable, that could be causing this issue to 
>>> >> occur. We
>>> >>   considered using SSTable2Keys/json, however, since both the tools 
>>> >> involve
>>> >>   outputting the entire content/keys of the SSTable in the order of 
>>> >> the keys,
>>> >>   they were not helpful in this case.
>>> >>   2. Since we wanted to get data on a few oldest cells with
>>> >>   timestamps, we created a tool mostly based off of sstable2json, 
>>> >> called
>>> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable, 
>>> >> ordered either
>>> >>   on writetime/localDeletionTime. This helped us identify the 
>>> >> specific cells
>>> >>   in new SSTables with older timestamps, which further helped in 
>>> >> debugging on
>>> >>   the application end. From application team perspective, however, 
>>> >> writing
>>> >>   data with old timestamp is not a possible scenario.
>>> >>
>>> >>3. Below is a sample output of sstableslicer
>>> > [image: Inline image 2]
>>> >
>>> >
>>> >> Looking for suggestions, especially around following two things:
>>> >>
>>> >>1. Did we miss any other case in TWCS that could be causing such
>>> >>overlap?
>>> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>>> >>yes, I shall create a JIRA and submit a PR/patch for review.
>>> >>
>>> >> C* version we use is 2.1.17.
>>> >
>>> > Thanks,
>>> >> Sumanth
>>> >>
>>> >
>> 
> 


Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Sumanth Pasupuleti
My final try on pushing the attachment over.


​

On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti <
sumanth.pasupuleti...@gmail.com> wrote:

> Thanks for the insights Jeff! I did go through the tickets around dropping
> expired sstables that have overlaps - based on what I understand, the only
> undesirable impact of that would be possible data resurrection.
>
> I have now attached the output of sstableslicer with the mail. Will submit
> a patch for review.
>
> Thanks,
> Sumanth
>
> On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:
>
>> The most likely cause is read repairs due to consistency level repairs
>> (digest mismatch). The only way to actually eliminate read repair is to
>> read with CL:ONE, which almost nobody does (at least in time series use
>> cases, because it implies you probably write with ALL, or run repair which
>> - as you've noted - often isn't necessary in ttl-only use cases).
>>
>> I can't see the image, but more tools for understanding sstable state are
>> never a bad thing (as long as they're generally useful and maintainable).
>>
>> For what it's worth, there are tickets in flight for being more aggressive
>> at dropping overlaps, but there are companies that use tools that stop the
>> cluster, use sstablemetadata to identify sstables we knew should be fully
>> expired, and manually remove them (/bin/rm) before starting cassandra
>> again. It works reasonably well IF (and only if) you write all data with
>> TTLs, and you can identify fully expired sstables based on maximum
>> timestamps.
>>
>>
>>
>>
>> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
>> sumanth.pasupuleti...@gmail.com> wrote:
>>
>> > Hi,
>> >>
>> >> We use TWCS in a few of the column families that have TTL based
>> >> time-series data, and no explicit deletes are issued. Over the time, we
>> >> observed the disk usage has been increasing beyond the expected levels.
>> >>
>> >> Data directory in a particular node shows SSTables that are more than
>> >> 16days old, while the bucket size is configured at 12hours, TTL is at
>> >> 15days and GC grace at 1hour.
>> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>> >> and blocked SSTables. SSTableMetadata that is shown in the output
>> indicates
>> >> there is an overlap in the MinTS-MaxTS period among the blocking
>> SSTable
>> >> and the blocked SSTables, which is preventing the older SSTables from
>> >> getting dropped/deleted.
>> >>
>> >> Following are the possible root causes we considered
>> >>
>> >>1. Hints - old data hints getting replayed from the coordinator
>> node.
>> >>We ruled this out since hints live for no more than 1 day based on
>> our
>> >>configuration.
>> >>2. External compactions - no external compactions were run, that
>> >>could cause compaction of SSTables across the TWCS buckets.
>> >>3.  Read repairs - this is ruled out as well, since we never ran
>> >>external repairs, and read repair chance on the TWCS column
>> families has
>> >>been set to 0.
>> >>4.  Application team writing data with older timestamp (in newer
>> >>SSTables).
>> >>
>> >>
>> >>1. We wanted to identify the specific row keys with older timestamps
>> >>   in the blocking SSTable, that could be causing this issue to
>> occur. We
>> >>   considered using SSTable2Keys/json, however, since both the
>> tools involve
>> >>   outputting the entire content/keys of the SSTable in the order
>> of the keys,
>> >>   they were not helpful in this case.
>> >>   2. Since we wanted to get data on a few oldest cells with
>> >>   timestamps, we created a tool mostly based off of sstable2json,
>> called
>> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable,
>> ordered either
>> >>   on writetime/localDeletionTime. This helped us identify the
>> specific cells
>> >>   in new SSTables with older timestamps, which further helped in
>> debugging on
>> >>   the application end. From application team perspective, however,
>> writing
>> >>   data with old timestamp is not a possible scenario.
>> >>
>> >>3. Below is a sample output of sstableslicer
>> > [image: Inline image 2]
>> >
>> >
>> >> Looking for suggestions, especially around following two things:
>> >>
>> >>1. Did we miss any other case in TWCS that could be causing such
>> >>overlap?
>> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>> >>yes, I shall create a JIRA and submit a PR/patch for review.
>> >>
>> >> C* version we use is 2.1.17.
>> >
>> > Thanks,
>> >> Sumanth
>> >>
>> >
>>
>
>


Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Sumanth Pasupuleti
Thanks for the insights Jeff! I did go through the tickets around dropping
expired sstables that have overlaps - based on what I understand, the only
undesirable impact of that would be possible data resurrection.

I have now attached the output of sstableslicer with the mail. Will submit
a patch for review.

Thanks,
Sumanth

On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:

> The most likely cause is read repairs due to consistency level repairs
> (digest mismatch). The only way to actually eliminate read repair is to
> read with CL:ONE, which almost nobody does (at least in time series use
> cases, because it implies you probably write with ALL, or run repair which
> - as you've noted - often isn't necessary in ttl-only use cases).
>
> I can't see the image, but more tools for understanding sstable state are
> never a bad thing (as long as they're generally useful and maintainable).
>
> For what it's worth, there are tickets in flight for being more aggressive
> at dropping overlaps, but there are companies that use tools that stop the
> cluster, use sstablemetadata to identify sstables we knew should be fully
> expired, and manually remove them (/bin/rm) before starting cassandra
> again. It works reasonably well IF (and only if) you write all data with
> TTLs, and you can identify fully expired sstables based on maximum
> timestamps.
>
>
>
>
> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
> sumanth.pasupuleti...@gmail.com> wrote:
>
> > Hi,
> >>
> >> We use TWCS in a few of the column families that have TTL based
> >> time-series data, and no explicit deletes are issued. Over the time, we
> >> observed the disk usage has been increasing beyond the expected levels.
> >>
> >> Data directory in a particular node shows SSTables that are more than
> >> 16days old, while the bucket size is configured at 12hours, TTL is at
> >> 15days and GC grace at 1hour.
> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
> >> and blocked SSTables. SSTableMetadata that is shown in the output
> indicates
> >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
> >> and the blocked SSTables, which is preventing the older SSTables from
> >> getting dropped/deleted.
> >>
> >> Following are the possible root causes we considered
> >>
> >>1. Hints - old data hints getting replayed from the coordinator node.
> >>We ruled this out since hints live for no more than 1 day based on
> our
> >>configuration.
> >>2. External compactions - no external compactions were run, that
> >>could cause compaction of SSTables across the TWCS buckets.
> >>3.  Read repairs - this is ruled out as well, since we never ran
> >>external repairs, and read repair chance on the TWCS column families
> has
> >>been set to 0.
> >>4.  Application team writing data with older timestamp (in newer
> >>SSTables).
> >>
> >>
> >>1. We wanted to identify the specific row keys with older timestamps
> >>   in the blocking SSTable, that could be causing this issue to
> occur. We
> >>   considered using SSTable2Keys/json, however, since both the tools
> involve
> >>   outputting the entire content/keys of the SSTable in the order of
> the keys,
> >>   they were not helpful in this case.
> >>   2. Since we wanted to get data on a few oldest cells with
> >>   timestamps, we created a tool mostly based off of sstable2json,
> called
> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable,
> ordered either
> >>   on writetime/localDeletionTime. This helped us identify the
> specific cells
> >>   in new SSTables with older timestamps, which further helped in
> debugging on
> >>   the application end. From application team perspective, however,
> writing
> >>   data with old timestamp is not a possible scenario.
> >>
> >>3. Below is a sample output of sstableslicer
> > [image: Inline image 2]
> >
> >
> >> Looking for suggestions, especially around following two things:
> >>
> >>1. Did we miss any other case in TWCS that could be causing such
> >>overlap?
> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
> >>yes, I shall create a JIRA and submit a PR/patch for review.
> >>
> >> C* version we use is 2.1.17.
> >
> > Thanks,
> >> Sumanth
> >>
> >
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org