Re: CASSANDRA-14227 removing the 2038 limit

2023-04-02 Thread Berenguer Blasi

Hi all,

assuming lazy consensus here

Regards

On 22/3/23 15:55, Berenguer Blasi wrote:


Hi all,

14227 has undergone review and perf numbers look ok. Now I have to 
tackle the downgradability issue and hopefully then merge. This is 
what I have gathered from the many conversations, please help me let 
me know if this is correct or if I am missing sthg:


- Everything will be based off a feature flag. I will add a transient 
feature flag while waiting for CASSANDRA-18301 to land. I will merge 
to trunk and when CASSANDRA-18301 lands it should replace it. That 
makes CASSANDRA-18301 a release blocker (think multiple feature flags, 
avoid future feature flag deprecations,...). If the effort for the TTL 
feature flag is comparable to implementing CASSANDRA-18301 I might 
just do that (TBD).


- My code will have to behave as has always done and produce sstables 
_not_ in the new format. Once that feature flag toggles I can write 
sstables in the _new_ format with the new behavior. I will add testing 
for both behaviors and synthetically emulate the flag toggle.


- Providing a tool to downgrade sstables already written in the _new_ 
format in the _previous_ format is not in scope for 14227. That would 
be CASSANDRA-8928 in any case.


Is this correct?

Thx in advance.

On 3/2/23 15:24, Henrik Ingo wrote:
In that case I agree that increasing from 20 years is an interesting 
opportunity but clearly out of scope for your current ticket.


On Fri, Feb 3, 2023 at 3:48 PM Berenguer Blasi 
 wrote:


Hi,

20y is the current and historic value. 68y is what an integer can
accommodate hence the current 2038 limit since the 1970 Unix
epoch. I wouldn't make it a configurable value, off the top of my
head it would make for some interesting bugs and debugging
sessions when nodes had different values. Food for another ticket
in any case imo.

Regards

On 3/2/23 14:18, Henrik Ingo wrote:

Naive PHB questions to follow...

Why are 68y and 20y special? Could you pick any value? Could we
allow it to be configurable? (Last one probably overkill, just
asking to understand...)

If we can pick any values we want, instinctively I would
personally suggest to have TTL higher than 20 years, but also
kicking the can further than 2035, which is only 13 years from
now. Just to suggest a specific number, why not 35y and 2071?

henrik

On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi
 wrote:

Hi All,

a version using Uints, 20y max TTL and kicking the can down
the road until 2086 has been put up for review #justfyi

Regards

On 15/11/22 7:06, Berenguer Blasi wrote:


Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of
deletionTime i.e. it is true it happens here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
But we also have a DeletionTime serializer here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
that is writing an int and a long that would now write 2 longs.

TTL itself (the delta) remains an int in the new PR so it
should have no effect in size.

Did I reference the correct parts of the codebase? No
sstable expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.

In terms of "kicking a can down the road", this would be a
pretty vigorous kick. I wouldn't push back against this
deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size -
TTLs and deletion times are already written as unsigned
vints as offsets from an sstable epoch for each value.

I would dig in more carefully to explore why you’re
seeing this increase? For the same data there should be
no change to size on disk.



On 14 Nov 2022, at 06:36, C. Scott Andreas
  wrote:
A 2-3% increase in storage volume is roughly equivalent
to giving up the gain from LZ4 -> LZ4HC, or a one to
two-level bump in Zstandard compression levels. This
regression could be very expensive for storage-bound use
cases.

From the perspective of storage overhead, the unsigned
int approach sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi

 wrote:


Hi all,

We have done some more research on c14227. The current
patch for CASSANDRA-14227 solves the TTL limit issue by
switching TTL to long instead of int. This approach
does not have a negative impact on memtable memory
usage, as C* controles the memory used 

Re: CASSANDRA-14227 removing the 2038 limit

2023-03-22 Thread Berenguer Blasi

Hi all,

14227 has undergone review and perf numbers look ok. Now I have to 
tackle the downgradability issue and hopefully then merge. This is what 
I have gathered from the many conversations, please help me let me know 
if this is correct or if I am missing sthg:


- Everything will be based off a feature flag. I will add a transient 
feature flag while waiting for CASSANDRA-18301 to land. I will merge to 
trunk and when CASSANDRA-18301 lands it should replace it. That makes 
CASSANDRA-18301 a release blocker (think multiple feature flags, avoid 
future feature flag deprecations,...). If the effort for the TTL feature 
flag is comparable to implementing CASSANDRA-18301 I might just do that 
(TBD).


- My code will have to behave as has always done and produce sstables 
_not_ in the new format. Once that feature flag toggles I can write 
sstables in the _new_ format with the new behavior. I will add testing 
for both behaviors and synthetically emulate the flag toggle.


- Providing a tool to downgrade sstables already written in the _new_ 
format in the _previous_ format is not in scope for 14227. That would be 
CASSANDRA-8928 in any case.


Is this correct?

Thx in advance.

On 3/2/23 15:24, Henrik Ingo wrote:
In that case I agree that increasing from 20 years is an interesting 
opportunity but clearly out of scope for your current ticket.


On Fri, Feb 3, 2023 at 3:48 PM Berenguer Blasi 
 wrote:


Hi,

20y is the current and historic value. 68y is what an integer can
accommodate hence the current 2038 limit since the 1970 Unix
epoch. I wouldn't make it a configurable value, off the top of my
head it would make for some interesting bugs and debugging
sessions when nodes had different values. Food for another ticket
in any case imo.

Regards

On 3/2/23 14:18, Henrik Ingo wrote:

Naive PHB questions to follow...

Why are 68y and 20y special? Could you pick any value? Could we
allow it to be configurable? (Last one probably overkill, just
asking to understand...)

If we can pick any values we want, instinctively I would
personally suggest to have TTL higher than 20 years, but also
kicking the can further than 2035, which is only 13 years from
now. Just to suggest a specific number, why not 35y and 2071?

henrik

On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi
 wrote:

Hi All,

a version using Uints, 20y max TTL and kicking the can down
the road until 2086 has been put up for review #justfyi

Regards

On 15/11/22 7:06, Berenguer Blasi wrote:


Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of
deletionTime i.e. it is true it happens here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
But we also have a DeletionTime serializer here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
that is writing an int and a long that would now write 2 longs.

TTL itself (the delta) remains an int in the new PR so it
should have no effect in size.

Did I reference the correct parts of the codebase? No
sstable expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.

In terms of "kicking a can down the road", this would be a
pretty vigorous kick. I wouldn't push back against this
deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size -
TTLs and deletion times are already written as unsigned
vints as offsets from an sstable epoch for each value.

I would dig in more carefully to explore why you’re seeing
this increase? For the same data there should be no change
to size on disk.



On 14 Nov 2022, at 06:36, C. Scott Andreas
  wrote:
A 2-3% increase in storage volume is roughly equivalent
to giving up the gain from LZ4 -> LZ4HC, or a one to
two-level bump in Zstandard compression levels. This
regression could be very expensive for storage-bound use
cases.

From the perspective of storage overhead, the unsigned
int approach sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi

 wrote:


Hi all,

We have done some more research on c14227. The current
patch for CASSANDRA-14227 solves the TTL limit issue by
switching TTL to long instead of int. This approach does
not have a negative impact on memtable memory usage, as
C* controles the memory used by the Memtable, but based
on our testing it increases the bytes flushed by 4 to 7%

Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Henrik Ingo
In that case I agree that increasing from 20 years is an interesting
opportunity but clearly out of scope for your current ticket.

On Fri, Feb 3, 2023 at 3:48 PM Berenguer Blasi 
wrote:

> Hi,
>
> 20y is the current and historic value. 68y is what an integer can
> accommodate hence the current 2038 limit since the 1970 Unix epoch. I
> wouldn't make it a configurable value, off the top of my head it would make
> for some interesting bugs and debugging sessions when nodes had different
> values. Food for another ticket in any case imo.
>
> Regards
> On 3/2/23 14:18, Henrik Ingo wrote:
>
> Naive PHB questions to follow...
>
> Why are 68y and 20y special? Could you pick any value? Could we allow it
> to be configurable? (Last one probably overkill, just asking to
> understand...)
>
> If we can pick any values we want, instinctively I would personally
> suggest to have TTL higher than 20 years, but also kicking the can further
> than 2035, which is only 13 years from now. Just to suggest a specific
> number, why not 35y and 2071?
>
> henrik
>
> On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi 
> wrote:
>
>> Hi All,
>>
>> a version using Uints, 20y max TTL and kicking the can down the road
>> until 2086 has been put up for review #justfyi
>>
>> Regards
>> On 15/11/22 7:06, Berenguer Blasi wrote:
>>
>> Hi all,
>>
>> thanks for your answers!.
>>
>> To Benedict's point: In terms of the uvint enconding of deletionTime i.e.
>> it is true it happens here
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
>> But we also have a DeletionTime serializer here
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
>> that is writing an int and a long that would now write 2 longs.
>>
>> TTL itself (the delta) remains an int in the new PR so it should have no
>> effect in size.
>>
>> Did I reference the correct parts of the codebase? No sstable expert here.
>> On 14/11/22 19:28, Josh McKenzie wrote:
>>
>> in 2035 we'd hit the same problem again.
>>
>> In terms of "kicking a can down the road", this would be a pretty
>> vigorous kick. I wouldn't push back against this deferral. :)
>>
>> On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
>>
>>
>> I’m confused why we see *any* increase in sstable size - TTLs and
>> deletion times are already written as unsigned vints as offsets from an
>> sstable epoch for each value.
>>
>> I would dig in more carefully to explore why you’re seeing this increase?
>> For the same data there should be no change to size on disk.
>>
>>
>> On 14 Nov 2022, at 06:36, C. Scott Andreas 
>>  wrote:
>>
>> A 2-3% increase in storage volume is roughly equivalent to giving up the
>> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression
>> levels. This regression could be very expensive for storage-bound use cases.
>>
>> From the perspective of storage overhead, the unsigned int approach
>> sounds preferable.
>>
>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
>>  wrote:
>>
>> 
>>
>> Hi all,
>>
>> We have done some more research on c14227. The current patch for
>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead
>> of int. This approach does not have a negative impact on memtable memory
>> usage, as C* controles the memory used by the Memtable, but based on our
>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2
>> to 3%.
>>
>> As a mitigation to this problem it is possible to encode
>> *localDeletionTime* as a vint. It results in a 1% improvement but might
>> cause additional computations during compaction or some other operations.
>>
>> Benedict's proposal to keep on using ints for TTL but as a delta to
>> nowInSecond would work for memtables but not for work in the SSTable where
>> nowInSecond does not exist. By consequence we would still suffer from the
>> impact on byte flushed and bytes on disk.
>>
>> Another approach that was suggested is the use of unsigned integer. Java
>> 8 has an unsigned integer API that would allow us to use unsigned int for
>> TTLs. Based on computation unsigned ints would give us a maximum time of
>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp
>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough
>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>>
>> Happy to hear opinions.
>> On 18/10/22 10:56, Berenguer Blasi wrote:
>>
>> Hi,
>>
>> apologies for the late reply as I have been OOO. I have done some
>> profiling and results look virtually identical on trunk and 14227. I have
>> attached some screenshots to the ticket
>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes
>> are fooling me everything in the jfrs look the same.
>>
>> Regards
>> On 30/9/22 9:44, Berenguer Blasi wrote:
>>
>> Hi Benedict,
>>
>> thanks for the reply! Yes some profiling is probably needed, then we can
>> see if 

Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Berenguer Blasi

Hi,

20y is the current and historic value. 68y is what an integer can 
accommodate hence the current 2038 limit since the 1970 Unix epoch. I 
wouldn't make it a configurable value, off the top of my head it would 
make for some interesting bugs and debugging sessions when nodes had 
different values. Food for another ticket in any case imo.


Regards

On 3/2/23 14:18, Henrik Ingo wrote:

Naive PHB questions to follow...

Why are 68y and 20y special? Could you pick any value? Could we allow 
it to be configurable? (Last one probably overkill, just asking to 
understand...)


If we can pick any values we want, instinctively I would personally 
suggest to have TTL higher than 20 years, but also kicking the can 
further than 2035, which is only 13 years from now. Just to suggest a 
specific number, why not 35y and 2071?


henrik

On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi 
 wrote:


Hi All,

a version using Uints, 20y max TTL and kicking the can down the
road until 2086 has been put up for review #justfyi

Regards

On 15/11/22 7:06, Berenguer Blasi wrote:


Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of
deletionTime i.e. it is true it happens here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
But we also have a DeletionTime serializer here

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
that is writing an int and a long that would now write 2 longs.

TTL itself (the delta) remains an int in the new PR so it should
have no effect in size.

Did I reference the correct parts of the codebase? No sstable
expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.

In terms of "kicking a can down the road", this would be a
pretty vigorous kick. I wouldn't push back against this deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size - TTLs
and deletion times are already written as unsigned vints as
offsets from an sstable epoch for each value.

I would dig in more carefully to explore why you’re seeing this
increase? For the same data there should be no change to size
on disk.



On 14 Nov 2022, at 06:36, C. Scott Andreas
  wrote:
A 2-3% increase in storage volume is roughly equivalent to
giving up the gain from LZ4 -> LZ4HC, or a one to two-level
bump in Zstandard compression levels. This regression could be
very expensive for storage-bound use cases.

From the perspective of storage overhead, the unsigned int
approach sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi
 
wrote:


Hi all,

We have done some more research on c14227. The current patch
for CASSANDRA-14227 solves the TTL limit issue by switching
TTL to long instead of int. This approach does not have a
negative impact on memtable memory usage, as C* controles the
memory used by the Memtable, but based on our testing it
increases the bytes flushed by 4 to 7% and the byte on disk
by 2 to 3%.

As a mitigation to this problem it is possible to encode
/localDeletionTime/ as a vint. It results in a 1% improvement
but might cause additional computations during compaction or
some other operations.

Benedict's proposal to keep on using ints for TTL but as a
delta to nowInSecond would work for memtables but not for
work in the SSTable where nowInSecond does not exist. By
consequence we would still suffer from the impact on byte
flushed and bytes on disk.

Another approach that was suggested is the use of unsigned
integer. Java 8 has an unsigned integer API that would allow
us to use unsigned int for TTLs. Based on computation
unsigned ints would give us a maximum time of 136 years since
the Unix Epoch and therefore a maximum expiration timestamp
in 2106. We would have to keep TTL at 20y instead of 68y to
give us enough breathing room though, otherwise in 2035 we'd
hit the same problem again.

Happy to hear opinions.

On 18/10/22 10:56, Berenguer Blasi wrote:


Hi,

apologies for the late reply as I have been OOO. I have done
some profiling and results look virtually identical on trunk
and 14227. I have attached some screenshots to the ticket
https://issues.apache.org/jira/browse/CASSANDRA-14227.
Unless my eyes are fooling me everything in the jfrs look
the same.

Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably
needed, then we can see if going down the delta encoding
big refactor rabbit hole is worth it?

Let's see what other 

Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Henrik Ingo
Naive PHB questions to follow...

Why are 68y and 20y special? Could you pick any value? Could we allow it to
be configurable? (Last one probably overkill, just asking to understand...)

If we can pick any values we want, instinctively I would personally suggest
to have TTL higher than 20 years, but also kicking the can further than
2035, which is only 13 years from now. Just to suggest a specific number,
why not 35y and 2071?

henrik

On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi 
wrote:

> Hi All,
>
> a version using Uints, 20y max TTL and kicking the can down the road until
> 2086 has been put up for review #justfyi
>
> Regards
> On 15/11/22 7:06, Berenguer Blasi wrote:
>
> Hi all,
>
> thanks for your answers!.
>
> To Benedict's point: In terms of the uvint enconding of deletionTime i.e.
> it is true it happens here
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
> But we also have a DeletionTime serializer here
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
> that is writing an int and a long that would now write 2 longs.
>
> TTL itself (the delta) remains an int in the new PR so it should have no
> effect in size.
>
> Did I reference the correct parts of the codebase? No sstable expert here.
> On 14/11/22 19:28, Josh McKenzie wrote:
>
> in 2035 we'd hit the same problem again.
>
> In terms of "kicking a can down the road", this would be a pretty vigorous
> kick. I wouldn't push back against this deferral. :)
>
> On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
>
>
> I’m confused why we see *any* increase in sstable size - TTLs and deletion
> times are already written as unsigned vints as offsets from an sstable
> epoch for each value.
>
> I would dig in more carefully to explore why you’re seeing this increase?
> For the same data there should be no change to size on disk.
>
>
> On 14 Nov 2022, at 06:36, C. Scott Andreas 
>  wrote:
>
> A 2-3% increase in storage volume is roughly equivalent to giving up the
> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression
> levels. This regression could be very expensive for storage-bound use cases.
>
> From the perspective of storage overhead, the unsigned int approach sounds
> preferable.
>
> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
>  wrote:
>
> 
>
> Hi all,
>
> We have done some more research on c14227. The current patch for
> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead
> of int. This approach does not have a negative impact on memtable memory
> usage, as C* controles the memory used by the Memtable, but based on our
> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2
> to 3%.
>
> As a mitigation to this problem it is possible to encode
> *localDeletionTime* as a vint. It results in a 1% improvement but might
> cause additional computations during compaction or some other operations.
>
> Benedict's proposal to keep on using ints for TTL but as a delta to
> nowInSecond would work for memtables but not for work in the SSTable where
> nowInSecond does not exist. By consequence we would still suffer from the
> impact on byte flushed and bytes on disk.
>
> Another approach that was suggested is the use of unsigned integer. Java 8
> has an unsigned integer API that would allow us to use unsigned int for
> TTLs. Based on computation unsigned ints would give us a maximum time of
> 136 years since the Unix Epoch and therefore a maximum expiration timestamp
> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough
> breathing room though, otherwise in 2035 we'd hit the same problem again.
>
> Happy to hear opinions.
> On 18/10/22 10:56, Berenguer Blasi wrote:
>
> Hi,
>
> apologies for the late reply as I have been OOO. I have done some
> profiling and results look virtually identical on trunk and 14227. I have
> attached some screenshots to the ticket
> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are
> fooling me everything in the jfrs look the same.
>
> Regards
> On 30/9/22 9:44, Berenguer Blasi wrote:
>
> Hi Benedict,
>
> thanks for the reply! Yes some profiling is probably needed, then we can
> see if going down the delta encoding big refactor rabbit hole is worth it?
>
> Let's see what other concerns people bring up.
>
> Thx.
> On 29/9/22 11:12, Benedict Elliott Smith wrote:
>
> My only slight concern with this approach is the additional memory
> pressure. Since 64yrs should be plenty at any moment in time, I wonder if
> it wouldn’t be better to represent these times as deltas from the nowInSec
> being used to process the query. So, long math would only be used to
> normalise the times to this nowInSec (from whatever is stored in the
> sstable) within a method, and ints would be stored in memtables and any
> objects used for processing.
>
> This might admittedly be more work, but I don’t believe it should 

Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Berenguer Blasi

Hi All,

a version using Uints, 20y max TTL and kicking the can down the road 
until 2086 has been put up for review #justfyi


Regards

On 15/11/22 7:06, Berenguer Blasi wrote:


Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of deletionTime 
i.e. it is true it happens here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170. 
But we also have a DeletionTime serializer here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166 
that is writing an int and a long that would now write 2 longs.


TTL itself (the delta) remains an int in the new PR so it should have 
no effect in size.


Did I reference the correct parts of the codebase? No sstable expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty 
vigorous kick. I wouldn't push back against this deferral. :)


On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size - TTLs and 
deletion times are already written as unsigned vints as offsets from 
an sstable epoch for each value.


I would dig in more carefully to explore why you’re seeing this 
increase? For the same data there should be no change to size on disk.



On 14 Nov 2022, at 06:36, C. Scott Andreas  
wrote:
A 2-3% increase in storage volume is roughly equivalent to giving 
up the gain from LZ4 -> LZ4HC, or a one to two-level bump in 
Zstandard compression levels. This regression could be very 
expensive for storage-bound use cases.


From the perspective of storage overhead, the unsigned int approach 
sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
 wrote:



Hi all,

We have done some more research on c14227. The current patch for 
CASSANDRA-14227 solves the TTL limit issue by switching TTL to 
long instead of int. This approach does not have a negative impact 
on memtable memory usage, as C* controles the memory used by the 
Memtable, but based on our testing it increases the bytes flushed 
by 4 to 7% and the byte on disk by 2 to 3%.


As a mitigation to this problem it is possible to encode 
/localDeletionTime/ as a vint. It results in a 1% improvement but 
might cause additional computations during compaction or some 
other operations.


Benedict's proposal to keep on using ints for TTL but as a delta 
to nowInSecond would work for memtables but not for work in the 
SSTable where nowInSecond does not exist. By consequence we would 
still suffer from the impact on byte flushed and bytes on disk.


Another approach that was suggested is the use of unsigned 
integer. Java 8 has an unsigned integer API that would allow us to 
use unsigned int for TTLs. Based on computation unsigned ints 
would give us a maximum time of 136 years since the Unix Epoch and 
therefore a maximum expiration timestamp in 2106. We would have to 
keep TTL at 20y instead of 68y to give us enough breathing room 
though, otherwise in 2035 we'd hit the same problem again.


Happy to hear opinions.

On 18/10/22 10:56, Berenguer Blasi wrote:


Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 
14227. I have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227 
. Unless 
my eyes are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, 
then we can see if going down the delta encoding big refactor 
rabbit hole is worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional 
memory pressure. Since 64yrs should be plenty at any moment in 
time, I wonder if it wouldn’t be better to represent these 
times as deltas from the nowInSec being used to process the 
query. So, long math would only be used to normalise the times 
to this nowInSec (from whatever is stored in the sstable) 
within a method, and ints would be stored in memtables and any 
objects used for processing.


This might admittedly be more work, but I don’t believe it 
should be too challenging - we can introduce a method 
deletionTime(int nowInSec) that returns a long value by adding 
nowInSec to the deletionTime, and make the underlying value 
private, refactoring call sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi 
  
wrote:


Hi all,

I have taken a stab in a PR you can find attached in the 
ticket. Mainly:


- I have moved deletion times, gc and nowInSec timestamps to 
long. That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a 
sort of a 

Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Berenguer Blasi

Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of deletionTime 
i.e. it is true it happens here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170. 
But we also have a DeletionTime serializer here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166 
that is writing an int and a long that would now write 2 longs.


TTL itself (the delta) remains an int in the new PR so it should have no 
effect in size.


Did I reference the correct parts of the codebase? No sstable expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty 
vigorous kick. I wouldn't push back against this deferral. :)


On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size - TTLs and 
deletion times are already written as unsigned vints as offsets from 
an sstable epoch for each value.


I would dig in more carefully to explore why you’re seeing this 
increase? For the same data there should be no change to size on disk.




On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
A 2-3% increase in storage volume is roughly equivalent to giving 
up the gain from LZ4 -> LZ4HC, or a one to two-level bump in 
Zstandard compression levels. This regression could be very 
expensive for storage-bound use cases.


From the perspective of storage overhead, the unsigned int approach 
sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
 wrote:



Hi all,

We have done some more research on c14227. The current patch for 
CASSANDRA-14227 solves the TTL limit issue by switching TTL to long 
instead of int. This approach does not have a negative impact on 
memtable memory usage, as C* controles the memory used by the 
Memtable, but based on our testing it increases the bytes flushed 
by 4 to 7% and the byte on disk by 2 to 3%.


As a mitigation to this problem it is possible to encode 
/localDeletionTime/ as a vint. It results in a 1% improvement but 
might cause additional computations during compaction or some other 
operations.


Benedict's proposal to keep on using ints for TTL but as a delta to 
nowInSecond would work for memtables but not for work in the 
SSTable where nowInSecond does not exist. By consequence we would 
still suffer from the impact on byte flushed and bytes on disk.


Another approach that was suggested is the use of unsigned integer. 
Java 8 has an unsigned integer API that would allow us to use 
unsigned int for TTLs. Based on computation unsigned ints would 
give us a maximum time of 136 years since the Unix Epoch and 
therefore a maximum expiration timestamp in 2106. We would have to 
keep TTL at 20y instead of 68y to give us enough breathing room 
though, otherwise in 2035 we'd hit the same problem again.


Happy to hear opinions.

On 18/10/22 10:56, Berenguer Blasi wrote:


Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 14227. 
I have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227 
. Unless my 
eyes are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then 
we can see if going down the delta encoding big refactor rabbit 
hole is worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional 
memory pressure. Since 64yrs should be plenty at any moment in 
time, I wonder if it wouldn’t be better to represent these times 
as deltas from the nowInSec being used to process the query. So, 
long math would only be used to normalise the times to this 
nowInSec (from whatever is stored in the sstable) within a 
method, and ints would be stored in memtables and any objects 
used for processing.


This might admittedly be more work, but I don’t believe it 
should be too challenging - we can introduce a method 
deletionTime(int nowInSec) that returns a long value by adding 
nowInSec to the deletionTime, and make the underlying value 
private, refactoring call sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi 
  wrote:


Hi all,

I have taken a stab in a PR you can find attached in the 
ticket. Mainly:


- I have moved deletion times, gc and nowInSec timestamps to 
long. That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a 
sort of a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. 
Think upgrade scenarios or apps relying on 

Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Josh McKenzie
> in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty vigorous 
kick. I wouldn't push back against this deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
> 
> I’m confused why we see *any* increase in sstable size - TTLs and deletion 
> times are already written as unsigned vints as offsets from an sstable epoch 
> for each value.
> 
> I would dig in more carefully to explore why you’re seeing this increase? For 
> the same data there should be no change to size on disk.
> 
> 
>> On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
>> A 2-3% increase in storage volume is roughly equivalent to giving up the 
>> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
>> levels. This regression could be very expensive for storage-bound use cases.
>> 
>> From the perspective of storage overhead, the unsigned int approach sounds 
>> preferable.
>> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi  
>>> wrote:
>>>  
>>> Hi all,
>>> 
>>> We have done some more research on c14227. The current patch for 
>>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>>> of int. This approach does not have a negative impact on memtable memory 
>>> usage, as C* controles the memory used by the Memtable, but based on our 
>>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>>> to 3%.
>>> 
>>> As a mitigation to this problem it is possible to encode 
>>> *localDeletionTime* as a vint. It results in a 1% improvement but might 
>>> cause additional computations during compaction or some other operations.
>>> 
>>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>>> nowInSecond would work for memtables but not for work in the SSTable where 
>>> nowInSecond does not exist. By consequence we would still suffer from the 
>>> impact on byte flushed and bytes on disk.
>>> 
>>> Another approach that was suggested is the use of unsigned integer. Java 8 
>>> has an unsigned integer API that would allow us to use unsigned int for 
>>> TTLs. Based on computation unsigned ints would give us a maximum time of 
>>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp 
>>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>>> 
>>> Happy to hear opinions.
>>> 
>>> On 18/10/22 10:56, Berenguer Blasi wrote:
 Hi,
 
 apologies for the late reply as I have been OOO. I have done some 
 profiling and results look virtually identical on trunk and 14227. I have 
 attached some screenshots to the ticket 
 https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
 fooling me everything in the jfrs look the same.
 
 Regards
 
 On 30/9/22 9:44, Berenguer Blasi wrote:
> Hi Benedict,
> 
> thanks for the reply! Yes some profiling is probably needed, then we can 
> see if going down the delta encoding big refactor rabbit hole is worth it?
> 
> Let's see what other concerns people bring up.
> 
> Thx.
> 
> On 29/9/22 11:12, Benedict Elliott Smith wrote:
>> My only slight concern with this approach is the additional memory 
>> pressure. Since 64yrs should be plenty at any moment in time, I wonder 
>> if it wouldn’t be better to represent these times as deltas from the 
>> nowInSec being used to process the query. So, long math would only be 
>> used to normalise the times to this nowInSec (from whatever is stored in 
>> the sstable) within a method, and ints would be stored in memtables and 
>> any objects used for processing. 
>> 
>> This might admittedly be more work, but I don’t believe it should be too 
>> challenging - we can introduce a method deletionTime(int nowInSec) that 
>> returns a long value by adding nowInSec to the deletionTime, and make 
>> the underlying value private, refactoring call sites?
>> 
>>> On 29 Sep 2022, at 09:37, Berenguer Blasi  
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>>> 
>>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>>> should get us past the 2038 limit.
>>> 
>>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>>> 'free' guardrail.
>>> 
>>> - A new NONE overflow policy is the default but everything is backwards 
>>> compatible by keeping the previous ones in place. Think upgrade 
>>> scenarios or apps relying on the previous behavior.
>>> 
>>> - The new limit is around year 292,471,208,677 which sounds ok given 
>>> the Sun will start collapsing in 3 to 5 billion years :-)
>>> 
>>> - Please feel free to drop by the ticket and take a look at the PR even 
>>> if it's cursory
>>> 

Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Benedict
I’m confused why we see *any* increase in sstable size - TTLs and deletion 
times are already written as unsigned vints as offsets from an sstable epoch 
for each value.

I would dig in more carefully to explore why you’re seeing this increase? For 
the same data there should be no change to size on disk.

> On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
> 
> A 2-3% increase in storage volume is roughly equivalent to giving up the 
> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
> levels. This regression could be very expensive for storage-bound use cases.
> 
> From the perspective of storage overhead, the unsigned int approach sounds 
> preferable.
> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi  
>>> wrote:
>>> 
>> 
>> Hi all,
>> 
>> We have done some more research on c14227. The current patch for 
>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>> of int. This approach does not have a negative impact on memtable memory 
>> usage, as C* controles the memory used by the Memtable, but based on our 
>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>> to 3%.
>> 
>> As a mitigation to this problem it is possible to encode localDeletionTime 
>> as a vint. It results in a 1% improvement but might cause additional 
>> computations during compaction or some other operations.
>> 
>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>> nowInSecond would work for memtables but not for work in the SSTable where 
>> nowInSecond does not exist. By consequence we would still suffer from the 
>> impact on byte flushed and bytes on disk.
>> 
>> Another approach that was suggested is the use of unsigned integer. Java 8 
>> has an unsigned integer API that would allow us to use unsigned int for 
>> TTLs. Based on computation unsigned ints would give us a maximum time of 136 
>> years since the Unix Epoch and therefore a maximum expiration timestamp in 
>> 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>> 
>> Happy to hear opinions.
>> 
>> On 18/10/22 10:56, Berenguer Blasi wrote:
>>> Hi,
>>> 
>>> apologies for the late reply as I have been OOO. I have done some profiling 
>>> and results look virtually identical on trunk and 14227. I have attached 
>>> some screenshots to the ticket 
>>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
>>> fooling me everything in the jfrs look the same.
>>> 
>>> Regards
>>> 
>>> On 30/9/22 9:44, Berenguer Blasi wrote:
 Hi Benedict,
 
 thanks for the reply! Yes some profiling is probably needed, then we can 
 see if going down the delta encoding big refactor rabbit hole is worth it?
 
 Let's see what other concerns people bring up.
 
 Thx.
 
 On 29/9/22 11:12, Benedict Elliott Smith wrote:
> My only slight concern with this approach is the additional memory 
> pressure. Since 64yrs should be plenty at any moment in time, I wonder if 
> it wouldn’t be better to represent these times as deltas from the 
> nowInSec being used to process the query. So, long math would only be 
> used to normalise the times to this nowInSec (from whatever is stored in 
> the sstable) within a method, and ints would be stored in memtables and 
> any objects used for processing.
> 
> This might admittedly be more work, but I don’t believe it should be too 
> challenging - we can introduce a method deletionTime(int nowInSec) that 
> returns a long value by adding nowInSec to the deletionTime, and make the 
> underlying value private, refactoring call sites?
> 
>> On 29 Sep 2022, at 09:37, Berenguer Blasi  
>> wrote:
>> 
>> Hi all,
>> 
>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>> 
>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>> should get us past the 2038 limit.
>> 
>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>> 'free' guardrail.
>> 
>> - A new NONE overflow policy is the default but everything is backwards 
>> compatible by keeping the previous ones in place. Think upgrade 
>> scenarios or apps relying on the previous behavior.
>> 
>> - The new limit is around year 292,471,208,677 which sounds ok given the 
>> Sun will start collapsing in 3 to 5 billion years :-)
>> 
>> - Please feel free to drop by the ticket and take a look at the PR even 
>> if it's cursory
>> 
>> Thx in advance.
>> 
> 


Re: CASSANDRA-14227 removing the 2038 limit

2022-11-13 Thread C. Scott Andreas
A 2-3% increase in storage volume is roughly equivalent to giving up the gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression levels. This regression could be very expensive for storage-bound use cases.From the perspective of storage overhead, the unsigned int approach sounds preferable.On Nov 13, 2022, at 10:13 PM, Berenguer Blasi  wrote:
  

  
  
Hi all,

We have done some more research on c14227. The current patch for
  CASSANDRA-14227 solves the TTL limit issue by switching TTL to
  long instead of int. This approach does not have a negative impact
  on memtable memory usage, as C* controles the memory used by the
  Memtable, but based on our testing it increases the bytes flushed
  by 4 to 7% and the byte on disk by 2 to 3%.

As a mitigation to this problem it is possible to encode localDeletionTime as a vint. It
  results in a 1% improvement but might cause additional
  computations during compaction or some other operations.

Benedict's proposal to keep on using ints for TTL but as a delta
  to nowInSecond would work for memtables but not for work in the
  SSTable where nowInSecond does not exist. By consequence we would
  still suffer from the impact on byte flushed and bytes on disk.

Another approach that was suggested is the use of unsigned
  integer. Java 8 has an unsigned integer API that would allow us to
  use unsigned int for TTLs. Based on computation unsigned ints
  would give us a maximum time of 136 years since the Unix Epoch and
  therefore a maximum expiration timestamp in 2106. We would have to
  keep TTL at 20y instead of 68y to give us enough breathing room
  though, otherwise in 2035 we'd hit the same problem again.
Happy to hear opinions.
On 18/10/22 10:56, Berenguer Blasi
  wrote:


  
  Hi,
  apologies for the late reply as I have been OOO. I have done
some profiling and results look virtually identical on trunk and
14227. I have attached some screenshots to the ticket https://issues.apache.org/jira/browse/CASSANDRA-14227.
Unless my eyes are fooling me everything in the jfrs look the
same.
  Regards
  
  On 30/9/22 9:44, Berenguer Blasi
wrote:
  
  

Hi Benedict,
thanks for the reply! Yes some profiling is probably needed,
  then we can see if going down the delta encoding big refactor
  rabbit hole is worth it?

Let's see what other concerns people bring up.
Thx.

On 29/9/22 11:12, Benedict Elliott
  Smith wrote:


  
  My only slight concern with this approach is
the additional memory pressure. Since 64yrs should be plenty
at any moment in time, I wonder if it wouldn’t be better to
represent these times as deltas from the nowInSec being used
to process the query. So, long math would only be used to
normalise the times to this nowInSec (from whatever is
stored in the sstable) within a method, and ints would be
stored in memtables and any objects used for processing.


This
  might admittedly be more work, but I don’t believe it
  should be too challenging - we can introduce a method
  deletionTime(int nowInSec) that returns a long value by
  adding nowInSec to the deletionTime, and make the
  underlying value private, refactoring call sites?

  
On 29 Sep 2022, at 09:37, Berenguer Blasi 
  wrote:


  Hi all,

I have taken a stab in a PR you can find attached in
the ticket. Mainly:

- I have moved deletion times, gc and nowInSec
timestamps to long. That should get us past the 2038
limit.

- TTL is maxed now to 68y. Think CQL API
compatibility and a sort of a 'free' guardrail.

- A new NONE overflow policy is the default but
everything is backwards compatible by keeping the
previous ones in place. Think upgrade scenarios or
apps relying on the previous behavior.

- The new limit is around year 292,471,208,677 which
sounds ok given the Sun will start collapsing in 3
to 5 billion years :-)

- Please feel free to drop by the ticket and take a
look at the PR even if it's cursory

Thx in advance.
  

Re: CASSANDRA-14227 removing the 2038 limit

2022-11-13 Thread Berenguer Blasi

Hi all,

We have done some more research on c14227. The current patch for 
CASSANDRA-14227 solves the TTL limit issue by switching TTL to long 
instead of int. This approach does not have a negative impact on 
memtable memory usage, as C* controles the memory used by the Memtable, 
but based on our testing it increases the bytes flushed by 4 to 7% and 
the byte on disk by 2 to 3%.


As a mitigation to this problem it is possible to encode 
/localDeletionTime/ as a vint. It results in a 1% improvement but might 
cause additional computations during compaction or some other operations.


Benedict's proposal to keep on using ints for TTL but as a delta to 
nowInSecond would work for memtables but not for work in the SSTable 
where nowInSecond does not exist. By consequence we would still suffer 
from the impact on byte flushed and bytes on disk.


Another approach that was suggested is the use of unsigned integer. Java 
8 has an unsigned integer API that would allow us to use unsigned int 
for TTLs. Based on computation unsigned ints would give us a maximum 
time of 136 years since the Unix Epoch and therefore a maximum 
expiration timestamp in 2106. We would have to keep TTL at 20y instead 
of 68y to give us enough breathing room though, otherwise in 2035 we'd 
hit the same problem again.


Happy to hear opinions.

On 18/10/22 10:56, Berenguer Blasi wrote:


Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 14227. I 
have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes 
are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then we 
can see if going down the delta encoding big refactor rabbit hole is 
worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional memory 
pressure. Since 64yrs should be plenty at any moment in time, I 
wonder if it wouldn’t be better to represent these times as deltas 
from the nowInSec being used to process the query. So, long math 
would only be used to normalise the times to this nowInSec (from 
whatever is stored in the sstable) within a method, and ints would 
be stored in memtables and any objects used for processing.


This might admittedly be more work, but I don’t believe it should be 
too challenging - we can introduce a method deletionTime(int 
nowInSec) that returns a long value by adding nowInSec to the 
deletionTime, and make the underlying value private, refactoring 
call sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi 
 wrote:


Hi all,

I have taken a stab in a PR you can find attached in the ticket. 
Mainly:


- I have moved deletion times, gc and nowInSec timestamps to long. 
That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a sort 
of a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. Think 
upgrade scenarios or apps relying on the previous behavior.


- The new limit is around year 292,471,208,677 which sounds ok 
given the Sun will start collapsing in 3 to 5 billion years :-)


- Please feel free to drop by the ticket and take a look at the PR 
even if it's cursory


Thx in advance.



Re: CASSANDRA-14227 removing the 2038 limit

2022-10-18 Thread Berenguer Blasi

Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 14227. I 
have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes 
are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then we 
can see if going down the delta encoding big refactor rabbit hole is 
worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional memory 
pressure. Since 64yrs should be plenty at any moment in time, I 
wonder if it wouldn’t be better to represent these times as deltas 
from the nowInSec being used to process the query. So, long math 
would only be used to normalise the times to this nowInSec (from 
whatever is stored in the sstable) within a method, and ints would be 
stored in memtables and any objects used for processing.


This might admittedly be more work, but I don’t believe it should be 
too challenging - we can introduce a method deletionTime(int 
nowInSec) that returns a long value by adding nowInSec to the 
deletionTime, and make the underlying value private, refactoring call 
sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi  
wrote:


Hi all,

I have taken a stab in a PR you can find attached in the ticket. Mainly:

- I have moved deletion times, gc and nowInSec timestamps to long. 
That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a sort of 
a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. Think 
upgrade scenarios or apps relying on the previous behavior.


- The new limit is around year 292,471,208,677 which sounds ok given 
the Sun will start collapsing in 3 to 5 billion years :-)


- Please feel free to drop by the ticket and take a look at the PR 
even if it's cursory


Thx in advance.



Re: CASSANDRA-14227 removing the 2038 limit

2022-09-30 Thread Berenguer Blasi

Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then we can 
see if going down the delta encoding big refactor rabbit hole is worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional memory 
pressure. Since 64yrs should be plenty at any moment in time, I wonder 
if it wouldn’t be better to represent these times as deltas from the 
nowInSec being used to process the query. So, long math would only be 
used to normalise the times to this nowInSec (from whatever is stored 
in the sstable) within a method, and ints would be stored in memtables 
and any objects used for processing.


This might admittedly be more work, but I don’t believe it should be 
too challenging - we can introduce a method deletionTime(int nowInSec) 
that returns a long value by adding nowInSec to the deletionTime, and 
make the underlying value private, refactoring call sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi  
wrote:


Hi all,

I have taken a stab in a PR you can find attached in the ticket. Mainly:

- I have moved deletion times, gc and nowInSec timestamps to long. 
That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a sort of 
a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. Think 
upgrade scenarios or apps relying on the previous behavior.


- The new limit is around year 292,471,208,677 which sounds ok given 
the Sun will start collapsing in 3 to 5 billion years :-)


- Please feel free to drop by the ticket and take a look at the PR 
even if it's cursory


Thx in advance.



Re: CASSANDRA-14227 removing the 2038 limit

2022-09-29 Thread Benedict Elliott Smith
My only slight concern with this approach is the additional memory pressure. 
Since 64yrs should be plenty at any moment in time, I wonder if it wouldn’t be 
better to represent these times as deltas from the nowInSec being used to 
process the query. So, long math would only be used to normalise the times to 
this nowInSec (from whatever is stored in the sstable) within a method, and 
ints would be stored in memtables and any objects used for processing.

This might admittedly be more work, but I don’t believe it should be too 
challenging - we can introduce a method deletionTime(int nowInSec) that returns 
a long value by adding nowInSec to the deletionTime, and make the underlying 
value private, refactoring call sites?

> On 29 Sep 2022, at 09:37, Berenguer Blasi  wrote:
> 
> Hi all,
> 
> I have taken a stab in a PR you can find attached in the ticket. Mainly:
> 
> - I have moved deletion times, gc and nowInSec timestamps to long. That 
> should get us past the 2038 limit.
> 
> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 'free' 
> guardrail.
> 
> - A new NONE overflow policy is the default but everything is backwards 
> compatible by keeping the previous ones in place. Think upgrade scenarios or 
> apps relying on the previous behavior.
> 
> - The new limit is around year 292,471,208,677 which sounds ok given the Sun 
> will start collapsing in 3 to 5 billion years :-)
> 
> - Please feel free to drop by the ticket and take a look at the PR even if 
> it's cursory
> 
> Thx in advance.



CASSANDRA-14227 removing the 2038 limit

2022-09-29 Thread Berenguer Blasi

Hi all,

I have taken a stab in a PR you can find attached in the ticket. Mainly:

- I have moved deletion times, gc and nowInSec timestamps to long. That 
should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
'free' guardrail.


- A new NONE overflow policy is the default but everything is backwards 
compatible by keeping the previous ones in place. Think upgrade 
scenarios or apps relying on the previous behavior.


- The new limit is around year 292,471,208,677 which sounds ok given the 
Sun will start collapsing in 3 to 5 billion years :-)


- Please feel free to drop by the ticket and take a look at the PR even 
if it's cursory


Thx in advance.