Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-08-17 Thread kurt greaves
I definitely think we should include it in 4.0. TBH I think it's reasonable
for it to get done after the feature freeze seeing as it is a bug.

On 17 August 2018 at 21:06, Anuj Wadehra 
wrote:

> Hi,
>
> I think CASSANDRA-14227 is pending for long time now. Though, the  data
> loss issue was addressed in CASSANDRA-14092, Cassandra users are still
> prohibited to use long TTLs (20+ years) as the maximum expiration timestamp
> that can be represented by the storage engine is 2038-01-19T03:14:06+00:00
> (due to the encoding of localExpirationTime as an int32). As per JIRA
> comments, the fix seems relatively simple. Considering high impact/returns
> and relatively less efforts, are there any plans to prioritize this fix for
> upcoming releases?
>
> Thanks
> Anuj
>
>
>
>
> On Saturday, 27 January, 2018, 8:35:20 PM IST, Anuj Wadehra <
> anujw_2...@yahoo.co.in.INVALID> wrote:
>
>
>
>
>
> Hi Paulo,
>
> Thanks for coming out with the Emergency Hot Fix!!
> The patch will help many Cassandra users in saving their precious data.
> I think the criticality and urgency of the bug is too high. How can we
> make sure that maximum Cassandra users are alerted about the silent
> deletion problem? What are formal ways of working for broadcasting such
> critical alerts?
> I still see that the JIRA is marked as a "Major" defect and not a
> "Blocker". What worst can happen to a database than irrecoverable silent
> deletion of successfully inserted data. I hope you understand.
>
>
>
> ThanksAnuj
>
>
>
>
>   On Fri, 26 Jan 2018 at 18:57, Paulo Motta
> wrote:  > I have serious concerns regarding reducing the TTL to 15 yrs.The
> patch will immediately break all existing applications in Production which
> are using 15+ yrs TTL.
>
> In order to prevent applications from breaking I will update the patch
> to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
> when it overflows and log a warning as a initial measure.  We will
> work on extending this limit or lifting this limitation, probably for
> the 3.0+ series due to the large scale compatibility changes required
> on lower versions, but community patches are always welcome.
>
> Companies that cannot upgrade to a version with the proper fix will
> need to workaround this limitation in some other way: do a batch job
> to delete old data periodically, perform deletes with timestamps in
> the future, etc.
>
> > If its a 32 bit timestamp, can't we just save/read localDeletionTime as
> unsinged int?
>
> The proper fix will likely be along these lines, but this involve many
> changes throughout the codebase where localDeletionTime is consumed
> and extensive testing, reviewing, etc, so we're now looking into a
> emergency hot fix to prevent silent data loss while the permanent fix is
> not in place.
>
> 2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> > Hi Jeff,
> > One correction in my last message: "it may be more feasible to SUPPORT
> (not extend) the 20 year limit in Cassandra in 2.1/2.2".
> > I completely agree that the existing 20 years TTL support is okay for
> older versions.
> >
> > If I have understood your last message correctly, upcoming patches are
> on following lines :
> >
> > 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for
> 2.1 & 2.2 would support the existing 20 year TTL limit and ensure that
> there is no data loss when 20 year is set as TTL.3. The patches for 2.1 and
> 2.2 are unlikely to update the sstable format.
> > 4. 3.x patches may even remove the 20 year TTL constraint (and extend
> TTL support beyond 2038).
> > I think that the JIRA priority should be increased from "Major" to
> "Blocker" as the JIRA may cause unexpected data loss. Also, all impacted
> versions should be included in the JIRA. This will attract the due
> attention of all Cassandra users.
> > ThanksAnuj
> >On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra <
> anujw_2...@yahoo.co.in.INVALID> wrote:
> >
> >  Hi Jeff,
> >
> > Thanks for the prompt action! I agree that patching an application MAY
> have a shorter life cycle than patching Cassandra in production. But, in
> the interest of the larger Cassandra user community, we should put our best
> effort to avoid breaking all the affected applications in production. We
> should also consider that updating business logic as per the new 15 year
> TTL constraint may have business implications for many users. I have a
> limited understanding about the complexity of the code patch, but it may be
> more feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather
> than asking all impacted users to do an immediate business logic
> adaptation. Moreover, now that we officially support Cassandra 2.1 & 2.2
> until 4.0 release and provide critical fixes for 2.1, it becomes even more
> reasonable to provide this extremely critical patch for 2.1 & 2.2 (unless
> its absolutely impossible). Still, many users use Cassandra 2.1 and 2.2 in
> their most critical production systems.
> >
> > Thanks
> > Anuj
> >
> > 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-08-17 Thread Anuj Wadehra
Hi,

I think CASSANDRA-14227 is pending for long time now. Though, the  data loss 
issue was addressed in CASSANDRA-14092, Cassandra users are still prohibited to 
use long TTLs (20+ years) as the maximum expiration timestamp that can be 
represented by the storage engine is 2038-01-19T03:14:06+00:00 (due to the 
encoding of localExpirationTime as an int32). As per JIRA comments, the fix 
seems relatively simple. Considering high impact/returns and relatively less 
efforts, are there any plans to prioritize this fix for upcoming releases? 

Thanks
Anuj




On Saturday, 27 January, 2018, 8:35:20 PM IST, Anuj Wadehra 
 wrote: 





Hi Paulo,

Thanks for coming out with the Emergency Hot Fix!! 
The patch will help many Cassandra users in saving their precious data.
I think the criticality and urgency of the bug is too high. How can we make 
sure that maximum Cassandra users are alerted about the silent deletion 
problem? What are formal ways of working for broadcasting such critical alerts? 
I still see that the JIRA is marked as a "Major" defect and not a "Blocker". 
What worst can happen to a database than irrecoverable silent deletion of 
successfully inserted data. I hope you understand.



ThanksAnuj




  On Fri, 26 Jan 2018 at 18:57, Paulo Motta wrote:  > 
I have serious concerns regarding reducing the TTL to 15 yrs.The patch will 
immediately break all existing applications in Production which are using 15+ 
yrs TTL.

In order to prevent applications from breaking I will update the patch
to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
when it overflows and log a warning as a initial measure.  We will
work on extending this limit or lifting this limitation, probably for
the 3.0+ series due to the large scale compatibility changes required
on lower versions, but community patches are always welcome.

Companies that cannot upgrade to a version with the proper fix will
need to workaround this limitation in some other way: do a batch job
to delete old data periodically, perform deletes with timestamps in
the future, etc.

> If its a 32 bit timestamp, can't we just save/read localDeletionTime as 
> unsinged int?

The proper fix will likely be along these lines, but this involve many
changes throughout the codebase where localDeletionTime is consumed
and extensive testing, reviewing, etc, so we're now looking into a
emergency hot fix to prevent silent data loss while the permanent fix is
not in place.

2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> Hi Jeff,
> One correction in my last message: "it may be more feasible to SUPPORT (not 
> extend) the 20 year limit in Cassandra in 2.1/2.2".
> I completely agree that the existing 20 years TTL support is okay for older 
> versions.
>
> If I have understood your last message correctly, upcoming patches are on 
> following lines :
>
> 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 
> & 2.2 would support the existing 20 year TTL limit and ensure that there is 
> no data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
> unlikely to update the sstable format.
> 4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
> support beyond 2038).
> I think that the JIRA priority should be increased from "Major" to "Blocker" 
> as the JIRA may cause unexpected data loss. Also, all impacted versions 
> should be included in the JIRA. This will attract the due attention of all 
> Cassandra users.
> ThanksAnuj
>    On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
> wrote:
>
>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY have a 
> shorter life cycle than patching Cassandra in production. But, in the 
> interest of the larger Cassandra user community, we should put our best 
> effort to avoid breaking all the affected applications in production. We 
> should also consider that updating business logic as per the new 15 year TTL 
> constraint may have business implications for many users. I have a limited 
> understanding about the complexity of the code patch, but it may be more 
> feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather than 
> asking all impacted users to do an immediate business logic adaptation. 
> Moreover, now that we officially support Cassandra 2.1 & 2.2 until 4.0 
> release and provide critical fixes for 2.1, it becomes even more reasonable 
> to provide this extremely critical patch for 2.1 & 2.2 (unless its absolutely 
> impossible). Still, many users use Cassandra 2.1 and 2.2 in their most 
> critical production systems.
>
> Thanks
> Anuj
>
>    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
>wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the 
>sstable format for old versions (unless whoever writes the patch makes a great 
>argument for it), so there’s probably not going to be post-2038 ttl support 
>for 2.1/2.2. For those old versions, we can definitely make it 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-27 Thread Anuj Wadehra
Hi Paulo,

Thanks for coming out with the Emergency Hot Fix!! 
The patch will help many Cassandra users in saving their precious data.
I think the criticality and urgency of the bug is too high. How can we make 
sure that maximum Cassandra users are alerted about the silent deletion 
problem? What are formal ways of working for broadcasting such critical alerts? 
I still see that the JIRA is marked as a "Major" defect and not a "Blocker". 
What worst can happen to a database than irrecoverable silent deletion of 
successfully inserted data. I hope you understand.



ThanksAnuj

 
 
  On Fri, 26 Jan 2018 at 18:57, Paulo Motta wrote:   
> I have serious concerns regarding reducing the TTL to 15 yrs.The patch will 
immediately break all existing applications in Production which are using 15+ 
yrs TTL.

In order to prevent applications from breaking I will update the patch
to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
when it overflows and log a warning as a initial measure.  We will
work on extending this limit or lifting this limitation, probably for
the 3.0+ series due to the large scale compatibility changes required
on lower versions, but community patches are always welcome.

Companies that cannot upgrade to a version with the proper fix will
need to workaround this limitation in some other way: do a batch job
to delete old data periodically, perform deletes with timestamps in
the future, etc.

> If its a 32 bit timestamp, can't we just save/read localDeletionTime as 
> unsinged int?

The proper fix will likely be along these lines, but this involve many
changes throughout the codebase where localDeletionTime is consumed
and extensive testing, reviewing, etc, so we're now looking into a
emergency hot fix to prevent silent data loss while the permanent fix is
not in place.

2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> Hi Jeff,
> One correction in my last message: "it may be more feasible to SUPPORT (not 
> extend) the 20 year limit in Cassandra in 2.1/2.2".
> I completely agree that the existing 20 years TTL support is okay for older 
> versions.
>
> If I have understood your last message correctly, upcoming patches are on 
> following lines :
>
> 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 
> & 2.2 would support the existing 20 year TTL limit and ensure that there is 
> no data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
> unlikely to update the sstable format.
> 4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
> support beyond 2038).
> I think that the JIRA priority should be increased from "Major" to "Blocker" 
> as the JIRA may cause unexpected data loss. Also, all impacted versions 
> should be included in the JIRA. This will attract the due attention of all 
> Cassandra users.
> ThanksAnuj
>    On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
> wrote:
>
>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY have a 
> shorter life cycle than patching Cassandra in production. But, in the 
> interest of the larger Cassandra user community, we should put our best 
> effort to avoid breaking all the affected applications in production. We 
> should also consider that updating business logic as per the new 15 year TTL 
> constraint may have business implications for many users. I have a limited 
> understanding about the complexity of the code patch, but it may be more 
> feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather than 
> asking all impacted users to do an immediate business logic adaptation. 
> Moreover, now that we officially support Cassandra 2.1 & 2.2 until 4.0 
> release and provide critical fixes for 2.1, it becomes even more reasonable 
> to provide this extremely critical patch for 2.1 & 2.2 (unless its absolutely 
> impossible). Still, many users use Cassandra 2.1 and 2.2 in their most 
> critical production systems.
>
> Thanks
> Anuj
>
>    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
>wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the 
>sstable format for old versions (unless whoever writes the patch makes a great 
>argument for it), so there’s probably not going to be post-2038 ttl support 
>for 2.1/2.2. For those old versions, we can definitely make it not lose data, 
>but we almost certainly aren’t going to make the ttl go past 2038 in old 
>versions.
>
> More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
> patched version should start by patching their app to not write invalid ttls 
> - your app release cycle is almost certainly faster than db patch / review / 
> test / release / validation, and you can avoid the data loss application side 
> by calculating the ttl explicitly. It’s not the best solution, but it beats 
> doing nothing, and 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-26 Thread Paulo Motta
> I have serious concerns regarding reducing the TTL to 15 yrs.The patch will 
> immediately break all existing applications in Production which are using 15+ 
> yrs TTL.

In order to prevent applications from breaking I will update the patch
to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
when it overflows and log a warning as a initial measure.  We will
work on extending this limit or lifting this limitation, probably for
the 3.0+ series due to the large scale compatibility changes required
on lower versions, but community patches are always welcome.

Companies that cannot upgrade to a version with the proper fix will
need to workaround this limitation in some other way: do a batch job
to delete old data periodically, perform deletes with timestamps in
the future, etc.

> If its a 32 bit timestamp, can't we just save/read localDeletionTime as 
> unsinged int?

The proper fix will likely be along these lines, but this involve many
changes throughout the codebase where localDeletionTime is consumed
and extensive testing, reviewing, etc, so we're now looking into a
emergency hot fix to prevent silent data loss while the permanent fix is
not in place.

2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> Hi Jeff,
> One correction in my last message: "it may be more feasible to SUPPORT (not 
> extend) the 20 year limit in Cassandra in 2.1/2.2".
> I completely agree that the existing 20 years TTL support is okay for older 
> versions.
>
> If I have understood your last message correctly, upcoming patches are on 
> following lines :
>
> 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 
> & 2.2 would support the existing 20 year TTL limit and ensure that there is 
> no data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
> unlikely to update the sstable format.
> 4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
> support beyond 2038).
> I think that the JIRA priority should be increased from "Major" to "Blocker" 
> as the JIRA may cause unexpected data loss. Also, all impacted versions 
> should be included in the JIRA. This will attract the due attention of all 
> Cassandra users.
> ThanksAnuj
> On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
>  wrote:
>
>   Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY have a 
> shorter life cycle than patching Cassandra in production. But, in the 
> interest of the larger Cassandra user community, we should put our best 
> effort to avoid breaking all the affected applications in production. We 
> should also consider that updating business logic as per the new 15 year TTL 
> constraint may have business implications for many users. I have a limited 
> understanding about the complexity of the code patch, but it may be more 
> feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather than 
> asking all impacted users to do an immediate business logic adaptation. 
> Moreover, now that we officially support Cassandra 2.1 & 2.2 until 4.0 
> release and provide critical fixes for 2.1, it becomes even more reasonable 
> to provide this extremely critical patch for 2.1 & 2.2 (unless its absolutely 
> impossible). Still, many users use Cassandra 2.1 and 2.2 in their most 
> critical production systems.
>
> Thanks
> Anuj
>
> On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
> wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the 
> sstable format for old versions (unless whoever writes the patch makes a 
> great argument for it), so there’s probably not going to be post-2038 ttl 
> support for 2.1/2.2. For those old versions, we can definitely make it not 
> lose data, but we almost certainly aren’t going to make the ttl go past 2038 
> in old versions.
>
> More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
> patched version should start by patching their app to not write invalid ttls 
> - your app release cycle is almost certainly faster than db patch / review / 
> test / release / validation, and you can avoid the data loss application side 
> by calculating the ttl explicitly. It’s not the best solution, but it beats 
> doing nothing, and we’re not rushing out a release in less than a day (we 
> haven’t even started a vote, and voting window is 72 hours for members to 
> review and approve or reject the candidate).
>
>
>
> --
> Jeff Jirsa
>
>
>> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
>>
>> Patches welcome.
>>
>> --
>> Jeff Jirsa
>>
>>
>>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>>> wrote:
>>>
>>> Hi Paulo,
>>>
>>> Thanks for looking into the issue on priority. I have serious concerns 
>>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>>> existing applications in Production which are using 15+ 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-26 Thread Anuj Wadehra
Hi Jeff,
One correction in my last message: "it may be more feasible to SUPPORT (not 
extend) the 20 year limit in Cassandra in 2.1/2.2". 
I completely agree that the existing 20 years TTL support is okay for older 
versions. 
 
If I have understood your last message correctly, upcoming patches are on 
following lines :

1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 & 
2.2 would support the existing 20 year TTL limit and ensure that there is no 
data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
unlikely to update the sstable format.
4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
support beyond 2038).
I think that the JIRA priority should be increased from "Major" to "Blocker" as 
the JIRA may cause unexpected data loss. Also, all impacted versions should be 
included in the JIRA. This will attract the due attention of all Cassandra 
users.
ThanksAnuj
On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
 wrote:  
 
  Hi Jeff,

Thanks for the prompt action! I agree that patching an application MAY have a 
shorter life cycle than patching Cassandra in production. But, in the interest 
of the larger Cassandra user community, we should put our best effort to avoid 
breaking all the affected applications in production. We should also consider 
that updating business logic as per the new 15 year TTL constraint may have 
business implications for many users. I have a limited understanding about the 
complexity of the code patch, but it may be more feasible to extend the 20 year 
limit in Cassandra in 2.1/2.2 rather than asking all impacted users to do an 
immediate business logic adaptation. Moreover, now that we officially support 
Cassandra 2.1 & 2.2 until 4.0 release and provide critical fixes for 2.1, it 
becomes even more reasonable to provide this extremely critical patch for 2.1 & 
2.2 (unless its absolutely impossible). Still, many users use Cassandra 2.1 and 
2.2 in their most critical production systems.

Thanks
Anuj

    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
wrote:  
 
 We’ll get patches out. They almost certainly aren’t going to change the 
sstable format for old versions (unless whoever writes the patch makes a great 
argument for it), so there’s probably not going to be post-2038 ttl support for 
2.1/2.2. For those old versions, we can definitely make it not lose data, but 
we almost certainly aren’t going to make the ttl go past 2038 in old versions. 

More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
patched version should start by patching their app to not write invalid ttls - 
your app release cycle is almost certainly faster than db patch / review / test 
/ release / validation, and you can avoid the data loss application side by 
calculating the ttl explicitly. It’s not the best solution, but it beats doing 
nothing, and we’re not rushing out a release in less than a day (we haven’t 
even started a vote, and voting window is 72 hours for members to review and 
approve or reject the candidate).



-- 
Jeff Jirsa


> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> 
> Patches welcome.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi Paulo,
>> 
>> Thanks for looking into the issue on priority. I have serious concerns 
>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>> existing applications in Production which are using 15+ yrs TTL. And then 
>> they would be stuck again until all the logic in Production software is 
>> modified and the software is upgraded immediately. This may take days. Such 
>> heavy downtime is generally not acceptable for any business. Yes, they will 
>> not have silent data loss but they would not be able to do any business 
>> either. I think the permanent fix must be prioritized and put on extremely 
>> fast track. This is a certain Blocker and the impact could be enormous--with 
>> and without the 15 year short-term patch.
>> 
>> And believe me --there are plenty such business use cases where you use very 
>> long TTLs such as 20 yrs for compliance and other reasons.
>> 
>> Thanks
>> Anuj
>> 
>>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>> wrote:  
>> 
>> why are people inserting data with a 15+ year TTL? sorta curious about the 
>> actual use case for that.
>> 
>>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>>> 
>>> The assertion was working fine until yesterday 03:14 UTC.
>>> 
>>> The long term solution would be to work with a long instead of a int. The
>>> serialized seems to be a variable-int already, so that should be fine
>>> already.
>>> 
>>> If you change the assertion to 15 years, then applications might fail, as
>>> they might be setting a 15+ year ttl.
>>> 
>>> regards,
>>> Christian
>>> 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-26 Thread horschi
If its a 32 bit timestamp, can't we just save/read localDeletionTime as
unsinged int? That would give it another 68 years. I think everyone
involved here could live with that limitation :-)

On Fri, Jan 26, 2018 at 8:16 AM, Anuj Wadehra <
anujw_2...@yahoo.co.in.invalid> wrote:

>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY
> have a shorter life cycle than patching Cassandra in production. But, in
> the interest of the larger Cassandra user community, we should put our best
> effort to avoid breaking all the affected applications in production. We
> should also consider that updating business logic as per the new 15 year
> TTL constraint may have business implications for many users. I have a
> limited understanding about the complexity of the code patch, but it may be
> more feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather
> than asking all impacted users to do an immediate business logic
> adaptation. Moreover, now that we officially support Cassandra 2.1 & 2.2
> until 4.0 release and provide critical fixes for 2.1, it becomes even more
> reasonable to provide this extremely critical patch for 2.1 & 2.2 (unless
> its absolutely impossible). Still, many users use Cassandra 2.1 and 2.2 in
> their most critical production systems.
>
> Thanks
> Anuj
>
> On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa <
> jji...@gmail.com> wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the
> sstable format for old versions (unless whoever writes the patch makes a
> great argument for it), so there’s probably not going to be post-2038 ttl
> support for 2.1/2.2. For those old versions, we can definitely make it not
> lose data, but we almost certainly aren’t going to make the ttl go past
> 2038 in old versions.
>
> More importantly, any company trying to do 20 year ttl’s that’s waiting
> for a patched version should start by patching their app to not write
> invalid ttls - your app release cycle is almost certainly faster than db
> patch / review / test / release / validation, and you can avoid the data
> loss application side by calculating the ttl explicitly. It’s not the best
> solution, but it beats doing nothing, and we’re not rushing out a release
> in less than a day (we haven’t even started a vote, and voting window is 72
> hours for members to review and approve or reject the candidate).
>
>
>
> --
> Jeff Jirsa
>
>
> > On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> >
> > Patches welcome.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra 
> wrote:
> >>
> >> Hi Paulo,
> >>
> >> Thanks for looking into the issue on priority. I have serious concerns
> regarding reducing the TTL to 15 yrs.The patch will immediately break all
> existing applications in Production which are using 15+ yrs TTL. And then
> they would be stuck again until all the logic in Production software is
> modified and the software is upgraded immediately. This may take days. Such
> heavy downtime is generally not acceptable for any business. Yes, they will
> not have silent data loss but they would not be able to do any business
> either. I think the permanent fix must be prioritized and put on extremely
> fast track. This is a certain Blocker and the impact could be
> enormous--with and without the 15 year short-term patch.
> >>
> >> And believe me --there are plenty such business use cases where you use
> very long TTLs such as 20 yrs for compliance and other reasons.
> >>
> >> Thanks
> >> Anuj
> >>
> >>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman <
> kjell...@apple.com> wrote:
> >>
> >> why are people inserting data with a 15+ year TTL? sorta curious about
> the actual use case for that.
> >>
> >>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> >>>
> >>> The assertion was working fine until yesterday 03:14 UTC.
> >>>
> >>> The long term solution would be to work with a long instead of a int.
> The
> >>> serialized seems to be a variable-int already, so that should be fine
> >>> already.
> >>>
> >>> If you change the assertion to 15 years, then applications might fail,
> as
> >>> they might be setting a 15+ year ttl.
> >>>
> >>> regards,
> >>> Christian
> >>>
> >>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta  >
> >>> wrote:
> >>>
>  Thanks for raising this. Agreed this is bad, when I filed
>  CASSANDRA-14092 I thought a write would fail when localDeletionTime
>  overflows (as it is with 2.1), but that doesn't seem to be the case on
>  3.0+
> 
>  I propose adding the assertion back so writes will fail, and reduce
>  the max TTL to something like 15 years for the time being while we
>  figure a long term solution.
> 
>  2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan <
> jeremiah.jor...@gmail.com>:
> > If you aren’t getting an error, then I agree, that is very bad.
> 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
 Hi Jeff,

Thanks for the prompt action! I agree that patching an application MAY have a 
shorter life cycle than patching Cassandra in production. But, in the interest 
of the larger Cassandra user community, we should put our best effort to avoid 
breaking all the affected applications in production. We should also consider 
that updating business logic as per the new 15 year TTL constraint may have 
business implications for many users. I have a limited understanding about the 
complexity of the code patch, but it may be more feasible to extend the 20 year 
limit in Cassandra in 2.1/2.2 rather than asking all impacted users to do an 
immediate business logic adaptation. Moreover, now that we officially support 
Cassandra 2.1 & 2.2 until 4.0 release and provide critical fixes for 2.1, it 
becomes even more reasonable to provide this extremely critical patch for 2.1 & 
2.2 (unless its absolutely impossible). Still, many users use Cassandra 2.1 and 
2.2 in their most critical production systems.

Thanks
Anuj

On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
wrote:  
 
 We’ll get patches out. They almost certainly aren’t going to change the 
sstable format for old versions (unless whoever writes the patch makes a great 
argument for it), so there’s probably not going to be post-2038 ttl support for 
2.1/2.2. For those old versions, we can definitely make it not lose data, but 
we almost certainly aren’t going to make the ttl go past 2038 in old versions. 

More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
patched version should start by patching their app to not write invalid ttls - 
your app release cycle is almost certainly faster than db patch / review / test 
/ release / validation, and you can avoid the data loss application side by 
calculating the ttl explicitly. It’s not the best solution, but it beats doing 
nothing, and we’re not rushing out a release in less than a day (we haven’t 
even started a vote, and voting window is 72 hours for members to review and 
approve or reject the candidate).



-- 
Jeff Jirsa


> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> 
> Patches welcome.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi Paulo,
>> 
>> Thanks for looking into the issue on priority. I have serious concerns 
>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>> existing applications in Production which are using 15+ yrs TTL. And then 
>> they would be stuck again until all the logic in Production software is 
>> modified and the software is upgraded immediately. This may take days. Such 
>> heavy downtime is generally not acceptable for any business. Yes, they will 
>> not have silent data loss but they would not be able to do any business 
>> either. I think the permanent fix must be prioritized and put on extremely 
>> fast track. This is a certain Blocker and the impact could be enormous--with 
>> and without the 15 year short-term patch.
>> 
>> And believe me --there are plenty such business use cases where you use very 
>> long TTLs such as 20 yrs for compliance and other reasons.
>> 
>> Thanks
>> Anuj
>> 
>>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>> wrote:  
>> 
>> why are people inserting data with a 15+ year TTL? sorta curious about the 
>> actual use case for that.
>> 
>>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>>> 
>>> The assertion was working fine until yesterday 03:14 UTC.
>>> 
>>> The long term solution would be to work with a long instead of a int. The
>>> serialized seems to be a variable-int already, so that should be fine
>>> already.
>>> 
>>> If you change the assertion to 15 years, then applications might fail, as
>>> they might be setting a 15+ year ttl.
>>> 
>>> regards,
>>> Christian
>>> 
>>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
>>> wrote:
>>> 
 Thanks for raising this. Agreed this is bad, when I filed
 CASSANDRA-14092 I thought a write would fail when localDeletionTime
 overflows (as it is with 2.1), but that doesn't seem to be the case on
 3.0+
 
 I propose adding the assertion back so writes will fail, and reduce
 the max TTL to something like 15 years for the time being while we
 figure a long term solution.
 
 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
> If you aren’t getting an error, then I agree, that is very bad.  Looking
 at the 3.0 code it looks like the assertion checking for overflow was
 dropped somewhere along the way, I had only been looking into 2.1 where you
 get an assertion error that fails the query.
> 
> -Jeremiah
> 
>> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> 
 wrote:
>> 
>> 
>> Hi Jeremiah,
>> 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeff Jirsa
We’ll get patches out. They almost certainly aren’t going to change the sstable 
format for old versions (unless whoever writes the patch makes a great argument 
for it), so there’s probably not going to be post-2038 ttl support for 2.1/2.2. 
For those old versions, we can definitely make it not lose data, but we almost 
certainly aren’t going to make the ttl go past 2038 in old versions. 

More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
patched version should start by patching their app to not write invalid ttls - 
your app release cycle is almost certainly faster than db patch / review / test 
/ release / validation, and you can avoid the data loss application side by 
calculating the ttl explicitly. It’s not the best solution, but it beats doing 
nothing, and we’re not rushing out a release in less than a day (we haven’t 
even started a vote, and voting window is 72 hours for members to review and 
approve or reject the candidate).



-- 
Jeff Jirsa


> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> 
> Patches welcome.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi Paulo,
>> 
>> Thanks for looking into the issue on priority. I have serious concerns 
>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>> existing applications in Production which are using 15+ yrs TTL. And then 
>> they would be stuck again until all the logic in Production software is 
>> modified and the software is upgraded immediately. This may take days. Such 
>> heavy downtime is generally not acceptable for any business. Yes, they will 
>> not have silent data loss but they would not be able to do any business 
>> either. I think the permanent fix must be prioritized and put on extremely 
>> fast track. This is a certain Blocker and the impact could be enormous--with 
>> and without the 15 year short-term patch.
>> 
>> And believe me --there are plenty such business use cases where you use very 
>> long TTLs such as 20 yrs for compliance and other reasons.
>> 
>> Thanks
>> Anuj
>> 
>>   On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>>  wrote:  
>> 
>> why are people inserting data with a 15+ year TTL? sorta curious about the 
>> actual use case for that.
>> 
>>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>>> 
>>> The assertion was working fine until yesterday 03:14 UTC.
>>> 
>>> The long term solution would be to work with a long instead of a int. The
>>> serialized seems to be a variable-int already, so that should be fine
>>> already.
>>> 
>>> If you change the assertion to 15 years, then applications might fail, as
>>> they might be setting a 15+ year ttl.
>>> 
>>> regards,
>>> Christian
>>> 
>>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
>>> wrote:
>>> 
 Thanks for raising this. Agreed this is bad, when I filed
 CASSANDRA-14092 I thought a write would fail when localDeletionTime
 overflows (as it is with 2.1), but that doesn't seem to be the case on
 3.0+
 
 I propose adding the assertion back so writes will fail, and reduce
 the max TTL to something like 15 years for the time being while we
 figure a long term solution.
 
 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
> If you aren’t getting an error, then I agree, that is very bad.  Looking
 at the 3.0 code it looks like the assertion checking for overflow was
 dropped somewhere along the way, I had only been looking into 2.1 where you
 get an assertion error that fails the query.
> 
> -Jeremiah
> 
>> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> 
 wrote:
>> 
>> 
>> Hi Jeremiah,
>> Validation is on TTL value not on (system_time+ TTL). You can test it
 with below example. Insert is successful, overflow happens silently and
 data is lost:
>> create table test(name text primary key,age int);
>> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
>> select * from test where name='test_20yrs';
>> 
>> name | age
>> --+-
>> 
>> (0 rows)
>> 
>> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
 message="ttl is too large. requested (630720001) maximum (63072)"
>> ThanksAnuj
>>  On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
 jeremiah.jor...@gmail.com> wrote:
>> 
>> Where is the dataloss?  Does the INSERT operation return successfully
 to the client in this case?  From reading the linked issues it sounds like
 you get an error client side.
>> 
>> -Jeremiah
>> 
>>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>>> 
 wrote:

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeff Jirsa
Patches welcome.

-- 
Jeff Jirsa


> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
> wrote:
> 
> Hi Paulo,
> 
> Thanks for looking into the issue on priority. I have serious concerns 
> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
> existing applications in Production which are using 15+ yrs TTL. And then 
> they would be stuck again until all the logic in Production software is 
> modified and the software is upgraded immediately. This may take days. Such 
> heavy downtime is generally not acceptable for any business. Yes, they will 
> not have silent data loss but they would not be able to do any business 
> either. I think the permanent fix must be prioritized and put on extremely 
> fast track. This is a certain Blocker and the impact could be enormous--with 
> and without the 15 year short-term patch.
> 
> And believe me --there are plenty such business use cases where you use very 
> long TTLs such as 20 yrs for compliance and other reasons.
> 
> Thanks
> Anuj
> 
>On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>  wrote:  
> 
> why are people inserting data with a 15+ year TTL? sorta curious about the 
> actual use case for that.
> 
>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>> 
>> The assertion was working fine until yesterday 03:14 UTC.
>> 
>> The long term solution would be to work with a long instead of a int. The
>> serialized seems to be a variable-int already, so that should be fine
>> already.
>> 
>> If you change the assertion to 15 years, then applications might fail, as
>> they might be setting a 15+ year ttl.
>> 
>> regards,
>> Christian
>> 
>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
>> wrote:
>> 
>>> Thanks for raising this. Agreed this is bad, when I filed
>>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>>> 3.0+
>>> 
>>> I propose adding the assertion back so writes will fail, and reduce
>>> the max TTL to something like 15 years for the time being while we
>>> figure a long term solution.
>>> 
>>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
 If you aren’t getting an error, then I agree, that is very bad.  Looking
>>> at the 3.0 code it looks like the assertion checking for overflow was
>>> dropped somewhere along the way, I had only been looking into 2.1 where you
>>> get an assertion error that fails the query.
 
 -Jeremiah
 
> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>>> wrote:
> 
> 
> Hi Jeremiah,
> Validation is on TTL value not on (system_time+ TTL). You can test it
>>> with below example. Insert is successful, overflow happens silently and
>>> data is lost:
> create table test(name text primary key,age int);
> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
> select * from test where name='test_20yrs';
> 
> name | age
> --+-
> 
> (0 rows)
> 
> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>>> message="ttl is too large. requested (630720001) maximum (63072)"
> ThanksAnuj
>   On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>>> jeremiah.jor...@gmail.com> wrote:
> 
> Where is the dataloss?  Does the INSERT operation return successfully
>>> to the client in this case?  From reading the linked issues it sounds like
>>> you get an error client side.
> 
> -Jeremiah
> 
>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> 
>>> wrote:
>> 
>> Hi,
>> 
>> For all those people who use MAX TTL=20 years for inserting/updating
>>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>>> one seems to be actively working on it. Just like any other critical
>>> vulnerability, this vulnerability demands immediate attention from some
>>> very experienced folks to bring out an Urgent Fast Track Patch for all
>>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>>> understanding of the JIRA comments, the changes may not be that trivial for
>>> older releases. So, community support on the patch is very much appreciated.
>> 
>> Thanks
>> Anuj
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
 Hi Paulo,

Thanks for looking into the issue on priority. I have serious concerns 
regarding reducing the TTL to 15 yrs.The patch will immediately break all 
existing applications in Production which are using 15+ yrs TTL. And then they 
would be stuck again until all the logic in Production software is modified and 
the software is upgraded immediately. This may take days. Such heavy downtime 
is generally not acceptable for any business. Yes, they will not have silent 
data loss but they would not be able to do any business either. I think the 
permanent fix must be prioritized and put on extremely fast track. This is a 
certain Blocker and the impact could be enormous--with and without the 15 year 
short-term patch.

And believe me --there are plenty such business use cases where you use very 
long TTLs such as 20 yrs for compliance and other reasons.

Thanks
Anuj

On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
 wrote:  
 
 why are people inserting data with a 15+ year TTL? sorta curious about the 
actual use case for that.

> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1 where you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
 On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
 
 
 Hi Jeremiah,
 Validation is on TTL value not on (system_time+ TTL). You can test it
>> with below example. Insert is successful, overflow happens silently and
>> data is lost:
 create table test(name text primary key,age int);
 insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
 select * from test where name='test_20yrs';
 
 name | age
 --+-
 
 (0 rows)
 
 insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
 ThanksAnuj
  On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
 
 Where is the dataloss?  Does the INSERT operation return successfully
>> to the client in this case?  From reading the linked issues it sounds like
>> you get an error client side.
 
 -Jeremiah
 
> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating
>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> one seems to be actively working on it. Just like any other critical
>> vulnerability, this vulnerability demands immediate attention from some
>> very experienced folks to bring out an Urgent Fast Track Patch for all
>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> understanding of the JIRA comments, the changes may not be that trivial for
>> older releases. So, community support on the patch is very much appreciated.
> 
> Thanks
> Anuj
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Robert Stupp
localDeletionTime is serialized as a 32-bit int in 2.1 and 2.2 - _not_ as a 
vint. Those versions need a fix as well and that fix should conceptually be the 
same for 3.0/3.x/trunk IMO.
Reducing the max TTL for now to something less than 20 years, is currently the 
only viable approach to mitigate the issue soon.
Applications that use a TTL of (nearly) 20yrs already have to reduce the TTL.

How a long-term fix might look is a separate topic. And that should be handled 
with care, not rush things.

> On 25. Jan 2018, at 22:17, horschi  wrote:
> 
> Paulo:
> Is readUnsignedVInt() limited to 32 bits? I would expect it to be of
> variable size. That would mean that the format would be fine. Correct  me
> if I'm wong!
> 
> 
> Brandon:
> Some applications might set the TTL dynamically. Of course the TTL could be
> capped and or removed in the application. But it might not be so obvious as
> you make it sound.
> 
> 
> On Thu, Jan 25, 2018 at 9:49 PM, Paulo Motta 
> wrote:
> 
>>> The long term solution would be to work with a long instead of a int. The
>> serialized seems to be a variable-int already, so that should be fine
>> already.
>> 
>> Agreed but apparently it needs a new sstable format as well as
>> mentioned on CASSANDRA-14092.
>> 
>>> If you change the assertion to 15 years, then applications might fail, as
>> they might be setting a 15+ year ttl.
>> 
>> This is an emergency measure while we provide a longer term fix. Any
>> application using TTL ~= 20 years will need to be lower the TTL anyway
>> to prevent data loss.
>> 
>> 2018-01-25 18:40 GMT-02:00 Brandon Williams :
>>> My guess is they don't know how to NOT set a TTL (perhaps with a default
>> in
>>> the schema), so they chose max value.  Someone else's problem by then.
>>> 
>>> On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman 
>>> wrote:
>>> 
 why are people inserting data with a 15+ year TTL? sorta curious about
>> the
 actual use case for that.
 
> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int.
>> The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might
>> fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta <
>> pauloricard...@gmail.com>
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case
>> on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan <
>> jeremiah.jor...@gmail.com
> :
>>> If you aren’t getting an error, then I agree, that is very bad.
 Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1
>> where
 you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
 On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
 
 
 Hi Jeremiah,
 Validation is on TTL value not on (system_time+ TTL). You can test
>> it
>> with below example. Insert is successful, overflow happens silently
>> and
>> data is lost:
 create table test(name text primary key,age int);
 insert into test(name,age) values('test_20yrs',30) USING TTL
 63072;
 select * from test where name='test_20yrs';
 
 name | age
 --+-
 
 (0 rows)
 
 insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid
>> query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
 ThanksAnuj
  On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
 
 Where is the dataloss?  Does the INSERT operation return
>> successfully
>> to the client in this case?  From reading the linked issues it sounds
 like
>> you get an error client side.
 
 -Jeremiah
 
> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra > .
 INVALID>
>> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for
>> inserting/updating
>> data in production, 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
Paulo:
Is readUnsignedVInt() limited to 32 bits? I would expect it to be of
variable size. That would mean that the format would be fine. Correct  me
if I'm wong!


Brandon:
Some applications might set the TTL dynamically. Of course the TTL could be
capped and or removed in the application. But it might not be so obvious as
you make it sound.


On Thu, Jan 25, 2018 at 9:49 PM, Paulo Motta 
wrote:

> > The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
>
> Agreed but apparently it needs a new sstable format as well as
> mentioned on CASSANDRA-14092.
>
> > If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
>
> This is an emergency measure while we provide a longer term fix. Any
> application using TTL ~= 20 years will need to be lower the TTL anyway
> to prevent data loss.
>
> 2018-01-25 18:40 GMT-02:00 Brandon Williams :
> > My guess is they don't know how to NOT set a TTL (perhaps with a default
> in
> > the schema), so they chose max value.  Someone else's problem by then.
> >
> > On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman 
> > wrote:
> >
> >> why are people inserting data with a 15+ year TTL? sorta curious about
> the
> >> actual use case for that.
> >>
> >> > On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> >> >
> >> > The assertion was working fine until yesterday 03:14 UTC.
> >> >
> >> > The long term solution would be to work with a long instead of a int.
> The
> >> > serialized seems to be a variable-int already, so that should be fine
> >> > already.
> >> >
> >> > If you change the assertion to 15 years, then applications might
> fail, as
> >> > they might be setting a 15+ year ttl.
> >> >
> >> > regards,
> >> > Christian
> >> >
> >> > On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta <
> pauloricard...@gmail.com>
> >> > wrote:
> >> >
> >> >> Thanks for raising this. Agreed this is bad, when I filed
> >> >> CASSANDRA-14092 I thought a write would fail when localDeletionTime
> >> >> overflows (as it is with 2.1), but that doesn't seem to be the case
> on
> >> >> 3.0+
> >> >>
> >> >> I propose adding the assertion back so writes will fail, and reduce
> >> >> the max TTL to something like 15 years for the time being while we
> >> >> figure a long term solution.
> >> >>
> >> >> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan <
> jeremiah.jor...@gmail.com
> >> >:
> >> >>> If you aren’t getting an error, then I agree, that is very bad.
> >> Looking
> >> >> at the 3.0 code it looks like the assertion checking for overflow was
> >> >> dropped somewhere along the way, I had only been looking into 2.1
> where
> >> you
> >> >> get an assertion error that fails the query.
> >> >>>
> >> >>> -Jeremiah
> >> >>>
> >>  On Jan 25, 2018, at 2:21 PM, Anuj Wadehra  >> INVALID>
> >> >> wrote:
> >> 
> >> 
> >>  Hi Jeremiah,
> >>  Validation is on TTL value not on (system_time+ TTL). You can test
> it
> >> >> with below example. Insert is successful, overflow happens silently
> and
> >> >> data is lost:
> >>  create table test(name text primary key,age int);
> >>  insert into test(name,age) values('test_20yrs',30) USING TTL
> >> 63072;
> >>  select * from test where name='test_20yrs';
> >> 
> >>  name | age
> >>  --+-
> >> 
> >>  (0 rows)
> >> 
> >>  insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
> >> >> 630720001;InvalidRequest: Error from server: code=2200 [Invalid
> query]
> >> >> message="ttl is too large. requested (630720001) maximum (63072)"
> >>  ThanksAnuj
> >>    On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
> >> >> jeremiah.jor...@gmail.com> wrote:
> >> 
> >>  Where is the dataloss?  Does the INSERT operation return
> successfully
> >> >> to the client in this case?  From reading the linked issues it sounds
> >> like
> >> >> you get an error client side.
> >> 
> >>  -Jeremiah
> >> 
> >> > On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  .
> >> INVALID>
> >> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > For all those people who use MAX TTL=20 years for
> inserting/updating
> >> >> data in production, https://issues.apache.org/
> >> jira/browse/CASSANDRA-14092
> >> >> can silently cause irrecoverable Data Loss. This seems like a certain
> >> TOP
> >> >> MOST BLOCKER to me. I think the category of the JIRA must be raised
> to
> >> >> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned"
> and no
> >> >> one seems to be actively working on it. Just like any other critical
> >> >> vulnerability, this vulnerability demands immediate attention from
> some
> >> >> very experienced folks to bring out an Urgent Fast Track Patch for
> all
> >> >> currently Supported 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Paulo Motta
> The long term solution would be to work with a long instead of a int. The
serialized seems to be a variable-int already, so that should be fine
already.

Agreed but apparently it needs a new sstable format as well as
mentioned on CASSANDRA-14092.

> If you change the assertion to 15 years, then applications might fail, as
they might be setting a 15+ year ttl.

This is an emergency measure while we provide a longer term fix. Any
application using TTL ~= 20 years will need to be lower the TTL anyway
to prevent data loss.

2018-01-25 18:40 GMT-02:00 Brandon Williams :
> My guess is they don't know how to NOT set a TTL (perhaps with a default in
> the schema), so they chose max value.  Someone else's problem by then.
>
> On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman 
> wrote:
>
>> why are people inserting data with a 15+ year TTL? sorta curious about the
>> actual use case for that.
>>
>> > On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>> >
>> > The assertion was working fine until yesterday 03:14 UTC.
>> >
>> > The long term solution would be to work with a long instead of a int. The
>> > serialized seems to be a variable-int already, so that should be fine
>> > already.
>> >
>> > If you change the assertion to 15 years, then applications might fail, as
>> > they might be setting a 15+ year ttl.
>> >
>> > regards,
>> > Christian
>> >
>> > On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
>> > wrote:
>> >
>> >> Thanks for raising this. Agreed this is bad, when I filed
>> >> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> >> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> >> 3.0+
>> >>
>> >> I propose adding the assertion back so writes will fail, and reduce
>> >> the max TTL to something like 15 years for the time being while we
>> >> figure a long term solution.
>> >>
>> >> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan > >:
>> >>> If you aren’t getting an error, then I agree, that is very bad.
>> Looking
>> >> at the 3.0 code it looks like the assertion checking for overflow was
>> >> dropped somewhere along the way, I had only been looking into 2.1 where
>> you
>> >> get an assertion error that fails the query.
>> >>>
>> >>> -Jeremiah
>> >>>
>>  On Jan 25, 2018, at 2:21 PM, Anuj Wadehra > INVALID>
>> >> wrote:
>> 
>> 
>>  Hi Jeremiah,
>>  Validation is on TTL value not on (system_time+ TTL). You can test it
>> >> with below example. Insert is successful, overflow happens silently and
>> >> data is lost:
>>  create table test(name text primary key,age int);
>>  insert into test(name,age) values('test_20yrs',30) USING TTL
>> 63072;
>>  select * from test where name='test_20yrs';
>> 
>>  name | age
>>  --+-
>> 
>>  (0 rows)
>> 
>>  insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> >> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> >> message="ttl is too large. requested (630720001) maximum (63072)"
>>  ThanksAnuj
>>    On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> >> jeremiah.jor...@gmail.com> wrote:
>> 
>>  Where is the dataloss?  Does the INSERT operation return successfully
>> >> to the client in this case?  From reading the linked issues it sounds
>> like
>> >> you get an error client side.
>> 
>>  -Jeremiah
>> 
>> > On Jan 25, 2018, at 1:24 PM, Anuj Wadehra > INVALID>
>> >> wrote:
>> >
>> > Hi,
>> >
>> > For all those people who use MAX TTL=20 years for inserting/updating
>> >> data in production, https://issues.apache.org/
>> jira/browse/CASSANDRA-14092
>> >> can silently cause irrecoverable Data Loss. This seems like a certain
>> TOP
>> >> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> >> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> >> one seems to be actively working on it. Just like any other critical
>> >> vulnerability, this vulnerability demands immediate attention from some
>> >> very experienced folks to bring out an Urgent Fast Track Patch for all
>> >> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> >> understanding of the JIRA comments, the changes may not be that trivial
>> for
>> >> older releases. So, community support on the patch is very much
>> appreciated.
>> >
>> > Thanks
>> > Anuj
>> 
>>  -
>>  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>  For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >>>
>> >>>
>> >>> -
>> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> >>> For additional commands, e-mail: 

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Brandon Williams
My guess is they don't know how to NOT set a TTL (perhaps with a default in
the schema), so they chose max value.  Someone else's problem by then.

On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman 
wrote:

> why are people inserting data with a 15+ year TTL? sorta curious about the
> actual use case for that.
>
> > On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> >
> > The assertion was working fine until yesterday 03:14 UTC.
> >
> > The long term solution would be to work with a long instead of a int. The
> > serialized seems to be a variable-int already, so that should be fine
> > already.
> >
> > If you change the assertion to 15 years, then applications might fail, as
> > they might be setting a 15+ year ttl.
> >
> > regards,
> > Christian
> >
> > On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> > wrote:
> >
> >> Thanks for raising this. Agreed this is bad, when I filed
> >> CASSANDRA-14092 I thought a write would fail when localDeletionTime
> >> overflows (as it is with 2.1), but that doesn't seem to be the case on
> >> 3.0+
> >>
> >> I propose adding the assertion back so writes will fail, and reduce
> >> the max TTL to something like 15 years for the time being while we
> >> figure a long term solution.
> >>
> >> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan  >:
> >>> If you aren’t getting an error, then I agree, that is very bad.
> Looking
> >> at the 3.0 code it looks like the assertion checking for overflow was
> >> dropped somewhere along the way, I had only been looking into 2.1 where
> you
> >> get an assertion error that fails the query.
> >>>
> >>> -Jeremiah
> >>>
>  On Jan 25, 2018, at 2:21 PM, Anuj Wadehra  INVALID>
> >> wrote:
> 
> 
>  Hi Jeremiah,
>  Validation is on TTL value not on (system_time+ TTL). You can test it
> >> with below example. Insert is successful, overflow happens silently and
> >> data is lost:
>  create table test(name text primary key,age int);
>  insert into test(name,age) values('test_20yrs',30) USING TTL
> 63072;
>  select * from test where name='test_20yrs';
> 
>  name | age
>  --+-
> 
>  (0 rows)
> 
>  insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
> >> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
> >> message="ttl is too large. requested (630720001) maximum (63072)"
>  ThanksAnuj
>    On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
> >> jeremiah.jor...@gmail.com> wrote:
> 
>  Where is the dataloss?  Does the INSERT operation return successfully
> >> to the client in this case?  From reading the linked issues it sounds
> like
> >> you get an error client side.
> 
>  -Jeremiah
> 
> > On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  INVALID>
> >> wrote:
> >
> > Hi,
> >
> > For all those people who use MAX TTL=20 years for inserting/updating
> >> data in production, https://issues.apache.org/
> jira/browse/CASSANDRA-14092
> >> can silently cause irrecoverable Data Loss. This seems like a certain
> TOP
> >> MOST BLOCKER to me. I think the category of the JIRA must be raised to
> >> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
> >> one seems to be actively working on it. Just like any other critical
> >> vulnerability, this vulnerability demands immediate attention from some
> >> very experienced folks to bring out an Urgent Fast Track Patch for all
> >> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
> >> understanding of the JIRA comments, the changes may not be that trivial
> for
> >> older releases. So, community support on the patch is very much
> appreciated.
> >
> > Thanks
> > Anuj
> 
>  -
>  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>  For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>


Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Michael Kjellman
why are people inserting data with a 15+ year TTL? sorta curious about the 
actual use case for that.

> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1 where you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
 On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
 
 
 Hi Jeremiah,
 Validation is on TTL value not on (system_time+ TTL). You can test it
>> with below example. Insert is successful, overflow happens silently and
>> data is lost:
 create table test(name text primary key,age int);
 insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
 select * from test where name='test_20yrs';
 
 name | age
 --+-
 
 (0 rows)
 
 insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
 ThanksAnuj
   On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
 
 Where is the dataloss?  Does the INSERT operation return successfully
>> to the client in this case?  From reading the linked issues it sounds like
>> you get an error client side.
 
 -Jeremiah
 
> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating
>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> one seems to be actively working on it. Just like any other critical
>> vulnerability, this vulnerability demands immediate attention from some
>> very experienced folks to bring out an Urgent Fast Track Patch for all
>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> understanding of the JIRA comments, the changes may not be that trivial for
>> older releases. So, community support on the patch is very much appreciated.
> 
> Thanks
> Anuj
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 



Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
The assertion was working fine until yesterday 03:14 UTC.

The long term solution would be to work with a long instead of a int. The
serialized seems to be a variable-int already, so that should be fine
already.

If you change the assertion to 15 years, then applications might fail, as
they might be setting a 15+ year ttl.

regards,
Christian

On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
wrote:

> Thanks for raising this. Agreed this is bad, when I filed
> CASSANDRA-14092 I thought a write would fail when localDeletionTime
> overflows (as it is with 2.1), but that doesn't seem to be the case on
> 3.0+
>
> I propose adding the assertion back so writes will fail, and reduce
> the max TTL to something like 15 years for the time being while we
> figure a long term solution.
>
> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
> > If you aren’t getting an error, then I agree, that is very bad.  Looking
> at the 3.0 code it looks like the assertion checking for overflow was
> dropped somewhere along the way, I had only been looking into 2.1 where you
> get an assertion error that fails the query.
> >
> > -Jeremiah
> >
> >> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
> wrote:
> >>
> >>
> >> Hi Jeremiah,
> >> Validation is on TTL value not on (system_time+ TTL). You can test it
> with below example. Insert is successful, overflow happens silently and
> data is lost:
> >> create table test(name text primary key,age int);
> >> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
> >> select * from test where name='test_20yrs';
> >>
> >>  name | age
> >> --+-
> >>
> >> (0 rows)
> >>
> >> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
> message="ttl is too large. requested (630720001) maximum (63072)"
> >> ThanksAnuj
> >>On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
> jeremiah.jor...@gmail.com> wrote:
> >>
> >> Where is the dataloss?  Does the INSERT operation return successfully
> to the client in this case?  From reading the linked issues it sounds like
> you get an error client side.
> >>
> >> -Jeremiah
> >>
> >>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> For all those people who use MAX TTL=20 years for inserting/updating
> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
> can silently cause irrecoverable Data Loss. This seems like a certain TOP
> MOST BLOCKER to me. I think the category of the JIRA must be raised to
> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
> one seems to be actively working on it. Just like any other critical
> vulnerability, this vulnerability demands immediate attention from some
> very experienced folks to bring out an Urgent Fast Track Patch for all
> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
> understanding of the JIRA comments, the changes may not be that trivial for
> older releases. So, community support on the patch is very much appreciated.
> >>>
> >>> Thanks
> >>> Anuj
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Paulo Motta
Thanks for raising this. Agreed this is bad, when I filed
CASSANDRA-14092 I thought a write would fail when localDeletionTime
overflows (as it is with 2.1), but that doesn't seem to be the case on
3.0+

I propose adding the assertion back so writes will fail, and reduce
the max TTL to something like 15 years for the time being while we
figure a long term solution.

2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
> If you aren’t getting an error, then I agree, that is very bad.  Looking at 
> the 3.0 code it looks like the assertion checking for overflow was dropped 
> somewhere along the way, I had only been looking into 2.1 where you get an 
> assertion error that fails the query.
>
> -Jeremiah
>
>> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra  
>> wrote:
>>
>>
>> Hi Jeremiah,
>> Validation is on TTL value not on (system_time+ TTL). You can test it with 
>> below example. Insert is successful, overflow happens silently and data is 
>> lost:
>> create table test(name text primary key,age int);
>> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
>> select * from test where name='test_20yrs';
>>
>>  name | age
>> --+-
>>
>> (0 rows)
>>
>> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL 
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query] 
>> message="ttl is too large. requested (630720001) maximum (63072)"
>> ThanksAnuj
>>On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan 
>>  wrote:
>>
>> Where is the dataloss?  Does the INSERT operation return successfully to the 
>> client in this case?  From reading the linked issues it sounds like you get 
>> an error client side.
>>
>> -Jeremiah
>>
>>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  
>>> wrote:
>>>
>>> Hi,
>>>
>>> For all those people who use MAX TTL=20 years for inserting/updating data 
>>> in production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can 
>>> silently cause irrecoverable Data Loss. This seems like a certain TOP MOST 
>>> BLOCKER to me. I think the category of the JIRA must be raised to BLOCKER 
>>> from Major. Unfortunately, the JIRA is still "Unassigned" and no one seems 
>>> to be actively working on it. Just like any other critical vulnerability, 
>>> this vulnerability demands immediate attention from some very experienced 
>>> folks to bring out an Urgent Fast Track Patch for all currently Supported 
>>> Cassandra versions 2.1,2.2 and 3.x. As per my understanding of the JIRA 
>>> comments, the changes may not be that trivial for older releases. So, 
>>> community support on the patch is very much appreciated.
>>>
>>> Thanks
>>> Anuj
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeremiah D Jordan
If you aren’t getting an error, then I agree, that is very bad.  Looking at the 
3.0 code it looks like the assertion checking for overflow was dropped 
somewhere along the way, I had only been looking into 2.1 where you get an 
assertion error that fails the query.

-Jeremiah

> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra  
> wrote:
> 
> 
> Hi Jeremiah,
> Validation is on TTL value not on (system_time+ TTL). You can test it with 
> below example. Insert is successful, overflow happens silently and data is 
> lost:
> create table test(name text primary key,age int);
> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
> select * from test where name='test_20yrs';
> 
>  name | age
> --+-
> 
> (0 rows)
> 
> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL 
> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query] 
> message="ttl is too large. requested (630720001) maximum (63072)"
> ThanksAnuj
>On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan 
>  wrote:  
> 
> Where is the dataloss?  Does the INSERT operation return successfully to the 
> client in this case?  From reading the linked issues it sounds like you get 
> an error client side.
> 
> -Jeremiah
> 
>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi,
>> 
>> For all those people who use MAX TTL=20 years for inserting/updating data in 
>> production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can 
>> silently cause irrecoverable Data Loss. This seems like a certain TOP MOST 
>> BLOCKER to me. I think the category of the JIRA must be raised to BLOCKER 
>> from Major. Unfortunately, the JIRA is still "Unassigned" and no one seems 
>> to be actively working on it. Just like any other critical vulnerability, 
>> this vulnerability demands immediate attention from some very experienced 
>> folks to bring out an Urgent Fast Track Patch for all currently Supported 
>> Cassandra versions 2.1,2.2 and 3.x. As per my understanding of the JIRA 
>> comments, the changes may not be that trivial for older releases. So, 
>> community support on the patch is very much appreciated. 
>> 
>> Thanks
>> Anuj
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
 
Hi Jeremiah,
Validation is on TTL value not on (system_time+ TTL). You can test it with 
below example. Insert is successful, overflow happens silently and data is lost:
create table test(name text primary key,age int);
insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
select * from test where name='test_20yrs';

 name | age
--+-

(0 rows)

insert into test(name,age) values('test_20yr_plus_1',30) USING TTL 
630720001;InvalidRequest: Error from server: code=2200 [Invalid query] 
message="ttl is too large. requested (630720001) maximum (63072)"
ThanksAnuj
On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan 
 wrote:  
 
 Where is the dataloss?  Does the INSERT operation return successfully to the 
client in this case?  From reading the linked issues it sounds like you get an 
error client side.

-Jeremiah

> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  
> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating data in 
> production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can 
> silently cause irrecoverable Data Loss. This seems like a certain TOP MOST 
> BLOCKER to me. I think the category of the JIRA must be raised to BLOCKER 
> from Major. Unfortunately, the JIRA is still "Unassigned" and no one seems to 
> be actively working on it. Just like any other critical vulnerability, this 
> vulnerability demands immediate attention from some very experienced folks to 
> bring out an Urgent Fast Track Patch for all currently Supported Cassandra 
> versions 2.1,2.2 and 3.x. As per my understanding of the JIRA comments, the 
> changes may not be that trivial for older releases. So, community support on 
> the patch is very much appreciated. 
> 
> Thanks
> Anuj

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org
  

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread J. D. Jordan
Where is the dataloss?  Does the INSERT operation return successfully to the 
client in this case?  From reading the linked issues it sounds like you get an 
error client side.

-Jeremiah

> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  
> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating data in 
> production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can 
> silently cause irrecoverable Data Loss. This seems like a certain TOP MOST 
> BLOCKER to me. I think the category of the JIRA must be raised to BLOCKER 
> from Major. Unfortunately, the JIRA is still "Unassigned" and no one seems to 
> be actively working on it. Just like any other critical vulnerability, this 
> vulnerability demands immediate attention from some very experienced folks to 
> bring out an Urgent Fast Track Patch for all currently Supported Cassandra 
> versions 2.1,2.2 and 3.x. As per my understanding of the JIRA comments, the 
> changes may not be that trivial for older releases. So, community support on 
> the patch is very much appreciated. 
> 
> Thanks
> Anuj

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org