Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Jeff Jirsa
Looks a lot like read repair but impossible to tell for sure


-- 
Jeff Jirsa


> On Aug 9, 2017, at 4:34 PM, Sumanth Pasupuleti 
>  wrote:
> 
> My final try on pushing the attachment over.
> 
> 
> ​
> 
>> On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti 
>>  wrote:
>> Thanks for the insights Jeff! I did go through the tickets around dropping 
>> expired sstables that have overlaps - based on what I understand, the only 
>> undesirable impact of that would be possible data resurrection.
>> 
>> I have now attached the output of sstableslicer with the mail. Will submit a 
>> patch for review.
>> 
>> Thanks,
>> Sumanth
>> 
>>> On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:
>>> The most likely cause is read repairs due to consistency level repairs
>>> (digest mismatch). The only way to actually eliminate read repair is to
>>> read with CL:ONE, which almost nobody does (at least in time series use
>>> cases, because it implies you probably write with ALL, or run repair which
>>> - as you've noted - often isn't necessary in ttl-only use cases).
>>> 
>>> I can't see the image, but more tools for understanding sstable state are
>>> never a bad thing (as long as they're generally useful and maintainable).
>>> 
>>> For what it's worth, there are tickets in flight for being more aggressive
>>> at dropping overlaps, but there are companies that use tools that stop the
>>> cluster, use sstablemetadata to identify sstables we knew should be fully
>>> expired, and manually remove them (/bin/rm) before starting cassandra
>>> again. It works reasonably well IF (and only if) you write all data with
>>> TTLs, and you can identify fully expired sstables based on maximum
>>> timestamps.
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
>>> sumanth.pasupuleti...@gmail.com> wrote:
>>> 
>>> > Hi,
>>> >>
>>> >> We use TWCS in a few of the column families that have TTL based
>>> >> time-series data, and no explicit deletes are issued. Over the time, we
>>> >> observed the disk usage has been increasing beyond the expected levels.
>>> >>
>>> >> Data directory in a particular node shows SSTables that are more than
>>> >> 16days old, while the bucket size is configured at 12hours, TTL is at
>>> >> 15days and GC grace at 1hour.
>>> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>>> >> and blocked SSTables. SSTableMetadata that is shown in the output 
>>> >> indicates
>>> >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
>>> >> and the blocked SSTables, which is preventing the older SSTables from
>>> >> getting dropped/deleted.
>>> >>
>>> >> Following are the possible root causes we considered
>>> >>
>>> >>1. Hints - old data hints getting replayed from the coordinator node.
>>> >>We ruled this out since hints live for no more than 1 day based on our
>>> >>configuration.
>>> >>2. External compactions - no external compactions were run, that
>>> >>could cause compaction of SSTables across the TWCS buckets.
>>> >>3.  Read repairs - this is ruled out as well, since we never ran
>>> >>external repairs, and read repair chance on the TWCS column families 
>>> >> has
>>> >>been set to 0.
>>> >>4.  Application team writing data with older timestamp (in newer
>>> >>SSTables).
>>> >>
>>> >>
>>> >>1. We wanted to identify the specific row keys with older timestamps
>>> >>   in the blocking SSTable, that could be causing this issue to 
>>> >> occur. We
>>> >>   considered using SSTable2Keys/json, however, since both the tools 
>>> >> involve
>>> >>   outputting the entire content/keys of the SSTable in the order of 
>>> >> the keys,
>>> >>   they were not helpful in this case.
>>> >>   2. Since we wanted to get data on a few oldest cells with
>>> >>   timestamps, we created a tool mostly based off of sstable2json, 
>>> >> called
>>> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable, 
>>> >> ordered either
>>> >>   on writetime/localDeletionTime. This helped us identify the 
>>> >> specific cells
>>> >>   in new SSTables with older timestamps, which further helped in 
>>> >> debugging on
>>> >>   the application end. From application team perspective, however, 
>>> >> writing
>>> >>   data with old timestamp is not a possible scenario.
>>> >>
>>> >>3. Below is a sample output of sstableslicer
>>> > [image: Inline image 2]
>>> >
>>> >
>>> >> Looking for suggestions, especially around following two things:
>>> >>
>>> >>1. Did we miss any other case in TWCS that could be causing such
>>> >>overlap?
>>> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>>> >>yes, I shall create a JIRA and submit a PR/patch for review.
>>> >>
>>> >> C* version we use is 2.1.17.
>>> >
>>> > Thanks,
>>> >> Sumanth
>>> >>
>>> >
>> 
> 


Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Sumanth Pasupuleti
My final try on pushing the attachment over.


​

On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti <
sumanth.pasupuleti...@gmail.com> wrote:

> Thanks for the insights Jeff! I did go through the tickets around dropping
> expired sstables that have overlaps - based on what I understand, the only
> undesirable impact of that would be possible data resurrection.
>
> I have now attached the output of sstableslicer with the mail. Will submit
> a patch for review.
>
> Thanks,
> Sumanth
>
> On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:
>
>> The most likely cause is read repairs due to consistency level repairs
>> (digest mismatch). The only way to actually eliminate read repair is to
>> read with CL:ONE, which almost nobody does (at least in time series use
>> cases, because it implies you probably write with ALL, or run repair which
>> - as you've noted - often isn't necessary in ttl-only use cases).
>>
>> I can't see the image, but more tools for understanding sstable state are
>> never a bad thing (as long as they're generally useful and maintainable).
>>
>> For what it's worth, there are tickets in flight for being more aggressive
>> at dropping overlaps, but there are companies that use tools that stop the
>> cluster, use sstablemetadata to identify sstables we knew should be fully
>> expired, and manually remove them (/bin/rm) before starting cassandra
>> again. It works reasonably well IF (and only if) you write all data with
>> TTLs, and you can identify fully expired sstables based on maximum
>> timestamps.
>>
>>
>>
>>
>> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
>> sumanth.pasupuleti...@gmail.com> wrote:
>>
>> > Hi,
>> >>
>> >> We use TWCS in a few of the column families that have TTL based
>> >> time-series data, and no explicit deletes are issued. Over the time, we
>> >> observed the disk usage has been increasing beyond the expected levels.
>> >>
>> >> Data directory in a particular node shows SSTables that are more than
>> >> 16days old, while the bucket size is configured at 12hours, TTL is at
>> >> 15days and GC grace at 1hour.
>> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>> >> and blocked SSTables. SSTableMetadata that is shown in the output
>> indicates
>> >> there is an overlap in the MinTS-MaxTS period among the blocking
>> SSTable
>> >> and the blocked SSTables, which is preventing the older SSTables from
>> >> getting dropped/deleted.
>> >>
>> >> Following are the possible root causes we considered
>> >>
>> >>1. Hints - old data hints getting replayed from the coordinator
>> node.
>> >>We ruled this out since hints live for no more than 1 day based on
>> our
>> >>configuration.
>> >>2. External compactions - no external compactions were run, that
>> >>could cause compaction of SSTables across the TWCS buckets.
>> >>3.  Read repairs - this is ruled out as well, since we never ran
>> >>external repairs, and read repair chance on the TWCS column
>> families has
>> >>been set to 0.
>> >>4.  Application team writing data with older timestamp (in newer
>> >>SSTables).
>> >>
>> >>
>> >>1. We wanted to identify the specific row keys with older timestamps
>> >>   in the blocking SSTable, that could be causing this issue to
>> occur. We
>> >>   considered using SSTable2Keys/json, however, since both the
>> tools involve
>> >>   outputting the entire content/keys of the SSTable in the order
>> of the keys,
>> >>   they were not helpful in this case.
>> >>   2. Since we wanted to get data on a few oldest cells with
>> >>   timestamps, we created a tool mostly based off of sstable2json,
>> called
>> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable,
>> ordered either
>> >>   on writetime/localDeletionTime. This helped us identify the
>> specific cells
>> >>   in new SSTables with older timestamps, which further helped in
>> debugging on
>> >>   the application end. From application team perspective, however,
>> writing
>> >>   data with old timestamp is not a possible scenario.
>> >>
>> >>3. Below is a sample output of sstableslicer
>> > [image: Inline image 2]
>> >
>> >
>> >> Looking for suggestions, especially around following two things:
>> >>
>> >>1. Did we miss any other case in TWCS that could be causing such
>> >>overlap?
>> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>> >>yes, I shall create a JIRA and submit a PR/patch for review.
>> >>
>> >> C* version we use is 2.1.17.
>> >
>> > Thanks,
>> >> Sumanth
>> >>
>> >
>>
>
>


Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-09 Thread Sumanth Pasupuleti
Thanks for the insights Jeff! I did go through the tickets around dropping
expired sstables that have overlaps - based on what I understand, the only
undesirable impact of that would be possible data resurrection.

I have now attached the output of sstableslicer with the mail. Will submit
a patch for review.

Thanks,
Sumanth

On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa  wrote:

> The most likely cause is read repairs due to consistency level repairs
> (digest mismatch). The only way to actually eliminate read repair is to
> read with CL:ONE, which almost nobody does (at least in time series use
> cases, because it implies you probably write with ALL, or run repair which
> - as you've noted - often isn't necessary in ttl-only use cases).
>
> I can't see the image, but more tools for understanding sstable state are
> never a bad thing (as long as they're generally useful and maintainable).
>
> For what it's worth, there are tickets in flight for being more aggressive
> at dropping overlaps, but there are companies that use tools that stop the
> cluster, use sstablemetadata to identify sstables we knew should be fully
> expired, and manually remove them (/bin/rm) before starting cassandra
> again. It works reasonably well IF (and only if) you write all data with
> TTLs, and you can identify fully expired sstables based on maximum
> timestamps.
>
>
>
>
> On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
> sumanth.pasupuleti...@gmail.com> wrote:
>
> > Hi,
> >>
> >> We use TWCS in a few of the column families that have TTL based
> >> time-series data, and no explicit deletes are issued. Over the time, we
> >> observed the disk usage has been increasing beyond the expected levels.
> >>
> >> Data directory in a particular node shows SSTables that are more than
> >> 16days old, while the bucket size is configured at 12hours, TTL is at
> >> 15days and GC grace at 1hour.
> >> Upon using sstableexpiredblockers, we got quite a few sets of blocking
> >> and blocked SSTables. SSTableMetadata that is shown in the output
> indicates
> >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
> >> and the blocked SSTables, which is preventing the older SSTables from
> >> getting dropped/deleted.
> >>
> >> Following are the possible root causes we considered
> >>
> >>1. Hints - old data hints getting replayed from the coordinator node.
> >>We ruled this out since hints live for no more than 1 day based on
> our
> >>configuration.
> >>2. External compactions - no external compactions were run, that
> >>could cause compaction of SSTables across the TWCS buckets.
> >>3.  Read repairs - this is ruled out as well, since we never ran
> >>external repairs, and read repair chance on the TWCS column families
> has
> >>been set to 0.
> >>4.  Application team writing data with older timestamp (in newer
> >>SSTables).
> >>
> >>
> >>1. We wanted to identify the specific row keys with older timestamps
> >>   in the blocking SSTable, that could be causing this issue to
> occur. We
> >>   considered using SSTable2Keys/json, however, since both the tools
> involve
> >>   outputting the entire content/keys of the SSTable in the order of
> the keys,
> >>   they were not helpful in this case.
> >>   2. Since we wanted to get data on a few oldest cells with
> >>   timestamps, we created a tool mostly based off of sstable2json,
> called
> >>   sstableslicer, to output 'n' top/bottom cells in an SSTable,
> ordered either
> >>   on writetime/localDeletionTime. This helped us identify the
> specific cells
> >>   in new SSTables with older timestamps, which further helped in
> debugging on
> >>   the application end. From application team perspective, however,
> writing
> >>   data with old timestamp is not a possible scenario.
> >>
> >>3. Below is a sample output of sstableslicer
> > [image: Inline image 2]
> >
> >
> >> Looking for suggestions, especially around following two things:
> >>
> >>1. Did we miss any other case in TWCS that could be causing such
> >>overlap?
> >>2. Does sstableslicer seem valuable, to be included in Apache C*? If
> >>yes, I shall create a JIRA and submit a PR/patch for review.
> >>
> >> C* version we use is 2.1.17.
> >
> > Thanks,
> >> Sumanth
> >>
> >
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-08 Thread Jeff Jirsa
The most likely cause is read repairs due to consistency level repairs
(digest mismatch). The only way to actually eliminate read repair is to
read with CL:ONE, which almost nobody does (at least in time series use
cases, because it implies you probably write with ALL, or run repair which
- as you've noted - often isn't necessary in ttl-only use cases).

I can't see the image, but more tools for understanding sstable state are
never a bad thing (as long as they're generally useful and maintainable).

For what it's worth, there are tickets in flight for being more aggressive
at dropping overlaps, but there are companies that use tools that stop the
cluster, use sstablemetadata to identify sstables we knew should be fully
expired, and manually remove them (/bin/rm) before starting cassandra
again. It works reasonably well IF (and only if) you write all data with
TTLs, and you can identify fully expired sstables based on maximum
timestamps.




On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
sumanth.pasupuleti...@gmail.com> wrote:

> Hi,
>>
>> We use TWCS in a few of the column families that have TTL based
>> time-series data, and no explicit deletes are issued. Over the time, we
>> observed the disk usage has been increasing beyond the expected levels.
>>
>> Data directory in a particular node shows SSTables that are more than
>> 16days old, while the bucket size is configured at 12hours, TTL is at
>> 15days and GC grace at 1hour.
>> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>> and blocked SSTables. SSTableMetadata that is shown in the output indicates
>> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
>> and the blocked SSTables, which is preventing the older SSTables from
>> getting dropped/deleted.
>>
>> Following are the possible root causes we considered
>>
>>1. Hints - old data hints getting replayed from the coordinator node.
>>We ruled this out since hints live for no more than 1 day based on our
>>configuration.
>>2. External compactions - no external compactions were run, that
>>could cause compaction of SSTables across the TWCS buckets.
>>3.  Read repairs - this is ruled out as well, since we never ran
>>external repairs, and read repair chance on the TWCS column families has
>>been set to 0.
>>4.  Application team writing data with older timestamp (in newer
>>SSTables).
>>
>>
>>1. We wanted to identify the specific row keys with older timestamps
>>   in the blocking SSTable, that could be causing this issue to occur. We
>>   considered using SSTable2Keys/json, however, since both the tools 
>> involve
>>   outputting the entire content/keys of the SSTable in the order of the 
>> keys,
>>   they were not helpful in this case.
>>   2. Since we wanted to get data on a few oldest cells with
>>   timestamps, we created a tool mostly based off of sstable2json, called
>>   sstableslicer, to output 'n' top/bottom cells in an SSTable, ordered 
>> either
>>   on writetime/localDeletionTime. This helped us identify the specific 
>> cells
>>   in new SSTables with older timestamps, which further helped in 
>> debugging on
>>   the application end. From application team perspective, however, 
>> writing
>>   data with old timestamp is not a possible scenario.
>>
>>3. Below is a sample output of sstableslicer
> [image: Inline image 2]
>
>
>> Looking for suggestions, especially around following two things:
>>
>>1. Did we miss any other case in TWCS that could be causing such
>>overlap?
>>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>>yes, I shall create a JIRA and submit a PR/patch for review.
>>
>> C* version we use is 2.1.17.
>
> Thanks,
>> Sumanth
>>
>


TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-08 Thread Sumanth Pasupuleti
>
> Hi,
>
> We use TWCS in a few of the column families that have TTL based
> time-series data, and no explicit deletes are issued. Over the time, we
> observed the disk usage has been increasing beyond the expected levels.
>
> Data directory in a particular node shows SSTables that are more than
> 16days old, while the bucket size is configured at 12hours, TTL is at
> 15days and GC grace at 1hour.
> Upon using sstableexpiredblockers, we got quite a few sets of blocking and
> blocked SSTables. SSTableMetadata that is shown in the output indicates
> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
> and the blocked SSTables, which is preventing the older SSTables from
> getting dropped/deleted.
>
> Following are the possible root causes we considered
>
>1. Hints - old data hints getting replayed from the coordinator node.
>We ruled this out since hints live for no more than 1 day based on our
>configuration.
>2. External compactions - no external compactions were run, that could
>cause compaction of SSTables across the TWCS buckets.
>3.  Read repairs - this is ruled out as well, since we never ran
>external repairs, and read repair chance on the TWCS column families has
>been set to 0.
>4.  Application team writing data with older timestamp (in newer
>SSTables).
>
>
>1. We wanted to identify the specific row keys with older timestamps
>   in the blocking SSTable, that could be causing this issue to occur. We
>   considered using SSTable2Keys/json, however, since both the tools 
> involve
>   outputting the entire content/keys of the SSTable in the order of the 
> keys,
>   they were not helpful in this case.
>   2. Since we wanted to get data on a few oldest cells with
>   timestamps, we created a tool mostly based off of sstable2json, called
>   sstableslicer, to output 'n' top/bottom cells in an SSTable, ordered 
> either
>   on writetime/localDeletionTime. This helped us identify the specific 
> cells
>   in new SSTables with older timestamps, which further helped in 
> debugging on
>   the application end. From application team perspective, however, writing
>   data with old timestamp is not a possible scenario.
>
>3. Below is a sample output of sstableslicer
[image: Inline image 2]


> Looking for suggestions, especially around following two things:
>
>1. Did we miss any other case in TWCS that could be causing such
>overlap?
>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>yes, I shall create a JIRA and submit a PR/patch for review.
>
> C* version we use is 2.1.17.

Thanks,
> Sumanth
>