Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeff Jirsa
Patches welcome. -- Jeff Jirsa > On Jan 25, 2018, at 8:15 PM, Anuj Wadehra > wrote: > > Hi Paulo, > > Thanks for looking into the issue on priority. I have serious concerns > regarding reducing the TTL to 15 yrs.The patch will immediately break all >

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
Hi Paulo, Thanks for looking into the issue on priority. I have serious concerns regarding reducing the TTL to 15 yrs.The patch will immediately break all existing applications in Production which are using 15+ yrs TTL. And then they would be stuck again until all the logic in Production

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeff Jirsa
We’ll get patches out. They almost certainly aren’t going to change the sstable format for old versions (unless whoever writes the patch makes a great argument for it), so there’s probably not going to be post-2038 ttl support for 2.1/2.2. For those old versions, we can definitely make it not

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
Hi Jeff, Thanks for the prompt action! I agree that patching an application MAY have a shorter life cycle than patching Cassandra in production. But, in the interest of the larger Cassandra user community, we should put our best effort to avoid breaking all the affected applications in

URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
Hi, For all those people who use MAX TTL=20 years for inserting/updating data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can silently cause irrecoverable Data Loss. This seems like a certain TOP MOST BLOCKER to me. I think the category of the JIRA must be raised to

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread J. D. Jordan
Where is the dataloss? Does the INSERT operation return successfully to the client in this case? From reading the linked issues it sounds like you get an error client side. -Jeremiah > On Jan 25, 2018, at 1:24 PM, Anuj Wadehra > wrote: > > Hi, > > For all

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra
Hi Jeremiah, Validation is on TTL value not on (system_time+ TTL). You can test it with below example. Insert is successful, overflow happens silently and data is lost: create table test(name text primary key,age int); insert into test(name,age) values('test_20yrs',30) USING TTL 63072;

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Jeremiah D Jordan
If you aren’t getting an error, then I agree, that is very bad. Looking at the 3.0 code it looks like the assertion checking for overflow was dropped somewhere along the way, I had only been looking into 2.1 where you get an assertion error that fails the query. -Jeremiah > On Jan 25, 2018,

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Paulo Motta
Thanks for raising this. Agreed this is bad, when I filed CASSANDRA-14092 I thought a write would fail when localDeletionTime overflows (as it is with 2.1), but that doesn't seem to be the case on 3.0+ I propose adding the assertion back so writes will fail, and reduce the max TTL to something

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
The assertion was working fine until yesterday 03:14 UTC. The long term solution would be to work with a long instead of a int. The serialized seems to be a variable-int already, so that should be fine already. If you change the assertion to 15 years, then applications might fail, as they might

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Michael Kjellman
why are people inserting data with a 15+ year TTL? sorta curious about the actual use case for that. > On Jan 25, 2018, at 12:36 PM, horschi wrote: > > The assertion was working fine until yesterday 03:14 UTC. > > The long term solution would be to work with a long instead

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Brandon Williams
My guess is they don't know how to NOT set a TTL (perhaps with a default in the schema), so they chose max value. Someone else's problem by then. On Thu, Jan 25, 2018 at 2:38 PM, Michael Kjellman wrote: > why are people inserting data with a 15+ year TTL? sorta curious

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Paulo Motta
> The long term solution would be to work with a long instead of a int. The serialized seems to be a variable-int already, so that should be fine already. Agreed but apparently it needs a new sstable format as well as mentioned on CASSANDRA-14092. > If you change the assertion to 15 years, then

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread horschi
Paulo: Is readUnsignedVInt() limited to 32 bits? I would expect it to be of variable size. That would mean that the format would be fine. Correct me if I'm wong! Brandon: Some applications might set the TTL dynamically. Of course the TTL could be capped and or removed in the application. But it

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Robert Stupp
localDeletionTime is serialized as a 32-bit int in 2.1 and 2.2 - _not_ as a vint. Those versions need a fix as well and that fix should conceptually be the same for 3.0/3.x/trunk IMO. Reducing the max TTL for now to something less than 20 years, is currently the only viable approach to mitigate