Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Berenguer Blasi

Hi all,

thanks for your answers!.

To Benedict's point: In terms of the uvint enconding of deletionTime 
i.e. it is true it happens here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170. 
But we also have a DeletionTime serializer here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166 
that is writing an int and a long that would now write 2 longs.


TTL itself (the delta) remains an int in the new PR so it should have no 
effect in size.


Did I reference the correct parts of the codebase? No sstable expert here.

On 14/11/22 19:28, Josh McKenzie wrote:

in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty 
vigorous kick. I wouldn't push back against this deferral. :)


On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:


I’m confused why we see *any* increase in sstable size - TTLs and 
deletion times are already written as unsigned vints as offsets from 
an sstable epoch for each value.


I would dig in more carefully to explore why you’re seeing this 
increase? For the same data there should be no change to size on disk.




On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
A 2-3% increase in storage volume is roughly equivalent to giving 
up the gain from LZ4 -> LZ4HC, or a one to two-level bump in 
Zstandard compression levels. This regression could be very 
expensive for storage-bound use cases.


From the perspective of storage overhead, the unsigned int approach 
sounds preferable.


On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
 wrote:



Hi all,

We have done some more research on c14227. The current patch for 
CASSANDRA-14227 solves the TTL limit issue by switching TTL to long 
instead of int. This approach does not have a negative impact on 
memtable memory usage, as C* controles the memory used by the 
Memtable, but based on our testing it increases the bytes flushed 
by 4 to 7% and the byte on disk by 2 to 3%.


As a mitigation to this problem it is possible to encode 
/localDeletionTime/ as a vint. It results in a 1% improvement but 
might cause additional computations during compaction or some other 
operations.


Benedict's proposal to keep on using ints for TTL but as a delta to 
nowInSecond would work for memtables but not for work in the 
SSTable where nowInSecond does not exist. By consequence we would 
still suffer from the impact on byte flushed and bytes on disk.


Another approach that was suggested is the use of unsigned integer. 
Java 8 has an unsigned integer API that would allow us to use 
unsigned int for TTLs. Based on computation unsigned ints would 
give us a maximum time of 136 years since the Unix Epoch and 
therefore a maximum expiration timestamp in 2106. We would have to 
keep TTL at 20y instead of 68y to give us enough breathing room 
though, otherwise in 2035 we'd hit the same problem again.


Happy to hear opinions.

On 18/10/22 10:56, Berenguer Blasi wrote:


Hi,

apologies for the late reply as I have been OOO. I have done some 
profiling and results look virtually identical on trunk and 14227. 
I have attached some screenshots to the ticket 
https://issues.apache.org/jira/browse/CASSANDRA-14227 
. Unless my 
eyes are fooling me everything in the jfrs look the same.


Regards

On 30/9/22 9:44, Berenguer Blasi wrote:


Hi Benedict,

thanks for the reply! Yes some profiling is probably needed, then 
we can see if going down the delta encoding big refactor rabbit 
hole is worth it?


Let's see what other concerns people bring up.

Thx.

On 29/9/22 11:12, Benedict Elliott Smith wrote:
My only slight concern with this approach is the additional 
memory pressure. Since 64yrs should be plenty at any moment in 
time, I wonder if it wouldn’t be better to represent these times 
as deltas from the nowInSec being used to process the query. So, 
long math would only be used to normalise the times to this 
nowInSec (from whatever is stored in the sstable) within a 
method, and ints would be stored in memtables and any objects 
used for processing.


This might admittedly be more work, but I don’t believe it 
should be too challenging - we can introduce a method 
deletionTime(int nowInSec) that returns a long value by adding 
nowInSec to the deletionTime, and make the underlying value 
private, refactoring call sites?


On 29 Sep 2022, at 09:37, Berenguer Blasi 
  wrote:


Hi all,

I have taken a stab in a PR you can find attached in the 
ticket. Mainly:


- I have moved deletion times, gc and nowInSec timestamps to 
long. That should get us past the 2038 limit.


- TTL is maxed now to 68y. Think CQL API compatibility and a 
sort of a 'free' guardrail.


- A new NONE overflow policy is the default but everything is 
backwards compatible by keeping the previous ones in place. 
Think upgrade scenarios or apps relying on 

Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Berenguer Blasi

+1 to waiver

On 15/11/22 2:07, Josh McKenzie wrote:

+1 to waiver.

We still don't have some kind of @flaky annotation that sequesters 
tests do we? :)


On Mon, Nov 14, 2022, at 5:58 PM, Ekaterina Dimitrova wrote:

+1

On Mon, 14 Nov 2022 at 17:55, Brandon Williams  wrote:

+1 to waiving these.

On Mon, Nov 14, 2022, 4:49 PM Miklosovic, Stefan
 wrote:

Tickets for the flaky tests are here

https://issues.apache.org/jira/browse/CASSANDRA-18047
https://issues.apache.org/jira/browse/CASSANDRA-18048


From: Mick Semb Wever 
Sent: Monday, November 14, 2022 23:28
To: dev@cassandra.apache.org
Subject: Re: Some tests are never executed in CI due to their
name

NetApp Security WARNING: This is an external email. Do not
click links or open attachments unless you recognize the
sender and know the content is safe.




in CASSANDRA-18029, two flaky tests were committed by mistake
due to my misunderstanding. We agreed on this thread that we
should not commit flaky tests right before rc. So now the rc
is technically blocked by them. To unblock it, what is needed
is to have a waiver on them. If there is not a waiver, I need
to go back to that test and remove the two test methods which
are flaky. (In practice they will be probably just @Ignore-ed
with comment about flakiness so we can fix them later).

Flaky tests are


org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair

org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop


+1 to a waiver on these two 4.1 flaky regressions to the RC
and GA releases.

Thanks for bringing it back to dev@ Stefan. Waivers should be
done on dev@ (build/release managers can't be keeping up with
every ticket), and dev threads and tickets should be kept
(reasonably) in-sync, for the sake of inclusiveness.

I believe there will be follow up tickets to address these
flakies in 4.1.x ?



Cassandra 3.0.27 - Tombstone disappeared during node replacement

2022-11-14 Thread Jaydeep Chovatia
Hi,

I am running Cassandra 3.0.27 in my production. In some corner case
scenarios, we have tombstones disappearing during bootstrap/decommission.
I've outlined a possible theory with the root cause in this ticket:
https://issues.apache.org/jira/browse/CASSANDRA-17991

Could someone please help validate this?

Jaydeep


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Josh McKenzie
+1 to waiver.

We still don't have some kind of @flaky annotation that sequesters tests do we? 
:)

On Mon, Nov 14, 2022, at 5:58 PM, Ekaterina Dimitrova wrote:
> +1
> 
> On Mon, 14 Nov 2022 at 17:55, Brandon Williams  wrote:
>> +1 to waiving these.
>> 
>> On Mon, Nov 14, 2022, 4:49 PM Miklosovic, Stefan 
>>  wrote:
>>> Tickets for the flaky tests are here
>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-18047
>>> https://issues.apache.org/jira/browse/CASSANDRA-18048
>>> 
>>> 
>>> From: Mick Semb Wever 
>>> Sent: Monday, November 14, 2022 23:28
>>> To: dev@cassandra.apache.org
>>> Subject: Re: Some tests are never executed in CI due to their name
>>> 
>>> NetApp Security WARNING: This is an external email. Do not click links or 
>>> open attachments unless you recognize the sender and know the content is 
>>> safe.
>>> 
>>> 
>>> 
>>> 
>>> in CASSANDRA-18029, two flaky tests were committed by mistake due to my 
>>> misunderstanding. We agreed on this thread that we should not commit flaky 
>>> tests right before rc. So now the rc is technically blocked by them. To 
>>> unblock it, what is needed is to have a waiver on them. If there is not a 
>>> waiver, I need to go back to that test and remove the two test methods 
>>> which are flaky. (In practice they will be probably just @Ignore-ed with 
>>> comment about flakiness so we can fix them later).
>>> 
>>> Flaky tests are
>>> 
>>> org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
>>> org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop
>>> 
>>> 
>>> +1 to a waiver on these two 4.1 flaky regressions to the RC and GA releases.
>>> 
>>> Thanks for bringing it back to dev@ Stefan. Waivers should be done on dev@ 
>>> (build/release managers can't be keeping up with every ticket), and dev 
>>> threads and tickets should be kept (reasonably) in-sync, for the sake of 
>>> inclusiveness.
>>> 
>>> I believe there will be follow up tickets to address these flakies in 4.1.x 
>>> ?


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Ekaterina Dimitrova
+1

On Mon, 14 Nov 2022 at 17:55, Brandon Williams  wrote:

> +1 to waiving these.
>
> On Mon, Nov 14, 2022, 4:49 PM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
>> Tickets for the flaky tests are here
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-18047
>> https://issues.apache.org/jira/browse/CASSANDRA-18048
>>
>> 
>> From: Mick Semb Wever 
>> Sent: Monday, November 14, 2022 23:28
>> To: dev@cassandra.apache.org
>> Subject: Re: Some tests are never executed in CI due to their name
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> in CASSANDRA-18029, two flaky tests were committed by mistake due to my
>> misunderstanding. We agreed on this thread that we should not commit flaky
>> tests right before rc. So now the rc is technically blocked by them. To
>> unblock it, what is needed is to have a waiver on them. If there is not a
>> waiver, I need to go back to that test and remove the two test methods
>> which are flaky. (In practice they will be probably just @Ignore-ed with
>> comment about flakiness so we can fix them later).
>>
>> Flaky tests are
>>
>>
>> org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
>>
>> org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop
>>
>>
>> +1 to a waiver on these two 4.1 flaky regressions to the RC and GA
>> releases.
>>
>> Thanks for bringing it back to dev@ Stefan. Waivers should be done on
>> dev@ (build/release managers can't be keeping up with every ticket), and
>> dev threads and tickets should be kept (reasonably) in-sync, for the sake
>> of inclusiveness.
>>
>> I believe there will be follow up tickets to address these flakies in
>> 4.1.x ?
>>
>


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Brandon Williams
+1 to waiving these.

On Mon, Nov 14, 2022, 4:49 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Tickets for the flaky tests are here
>
> https://issues.apache.org/jira/browse/CASSANDRA-18047
> https://issues.apache.org/jira/browse/CASSANDRA-18048
>
> 
> From: Mick Semb Wever 
> Sent: Monday, November 14, 2022 23:28
> To: dev@cassandra.apache.org
> Subject: Re: Some tests are never executed in CI due to their name
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> in CASSANDRA-18029, two flaky tests were committed by mistake due to my
> misunderstanding. We agreed on this thread that we should not commit flaky
> tests right before rc. So now the rc is technically blocked by them. To
> unblock it, what is needed is to have a waiver on them. If there is not a
> waiver, I need to go back to that test and remove the two test methods
> which are flaky. (In practice they will be probably just @Ignore-ed with
> comment about flakiness so we can fix them later).
>
> Flaky tests are
>
>
> org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
>
> org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop
>
>
> +1 to a waiver on these two 4.1 flaky regressions to the RC and GA
> releases.
>
> Thanks for bringing it back to dev@ Stefan. Waivers should be done on dev@
> (build/release managers can't be keeping up with every ticket), and dev
> threads and tickets should be kept (reasonably) in-sync, for the sake of
> inclusiveness.
>
> I believe there will be follow up tickets to address these flakies in
> 4.1.x ?
>


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Miklosovic, Stefan
Tickets for the flaky tests are here

https://issues.apache.org/jira/browse/CASSANDRA-18047
https://issues.apache.org/jira/browse/CASSANDRA-18048


From: Mick Semb Wever 
Sent: Monday, November 14, 2022 23:28
To: dev@cassandra.apache.org
Subject: Re: Some tests are never executed in CI due to their name

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




in CASSANDRA-18029, two flaky tests were committed by mistake due to my 
misunderstanding. We agreed on this thread that we should not commit flaky 
tests right before rc. So now the rc is technically blocked by them. To unblock 
it, what is needed is to have a waiver on them. If there is not a waiver, I 
need to go back to that test and remove the two test methods which are flaky. 
(In practice they will be probably just @Ignore-ed with comment about flakiness 
so we can fix them later).

Flaky tests are

org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop


+1 to a waiver on these two 4.1 flaky regressions to the RC and GA releases.

Thanks for bringing it back to dev@ Stefan. Waivers should be done on dev@ 
(build/release managers can't be keeping up with every ticket), and dev threads 
and tickets should be kept (reasonably) in-sync, for the sake of inclusiveness.

I believe there will be follow up tickets to address these flakies in 4.1.x ?


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Mick Semb Wever
> in CASSANDRA-18029, two flaky tests were committed by mistake due to my
> misunderstanding. We agreed on this thread that we should not commit flaky
> tests right before rc. So now the rc is technically blocked by them. To
> unblock it, what is needed is to have a waiver on them. If there is not a
> waiver, I need to go back to that test and remove the two test methods
> which are flaky. (In practice they will be probably just @Ignore-ed with
> comment about flakiness so we can fix them later).
>
> Flaky tests are
>
>
> org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
>
> org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop
>


+1 to a waiver on these two 4.1 flaky regressions to the RC and GA releases.

Thanks for bringing it back to dev@ Stefan. Waivers should be done on dev@
(build/release managers can't be keeping up with every ticket), and dev
threads and tickets should be kept (reasonably) in-sync, for the sake of
inclusiveness.

I believe there will be follow up tickets to address these flakies in
4.1.x ?


Re: Some tests are never executed in CI due to their name

2022-11-14 Thread Miklosovic, Stefan
Hi list,

in CASSANDRA-18029, two flaky tests were committed by mistake due to my 
misunderstanding. We agreed on this thread that we should not commit flaky 
tests right before rc. So now the rc is technically blocked by them. To unblock 
it, what is needed is to have a waiver on them. If there is not a waiver, I 
need to go back to that test and remove the two test methods which are flaky. 
(In practice they will be probably just @Ignore-ed with comment about flakiness 
so we can fix them later).

Flaky tests are

org.apache.cassandra.distributed.test.PaxosRepair2Test.paxosRepairHistoryIsntUpdatedInForcedRepair
org.apache.cassandra.distributed.test.PaxosRepair2Test.legacyPurgeRepairLoop

I sincerely apologize for messing things up and being drifted away by the 
ticket and forgetting about the ML thread which has the ultimate precedence in 
this matter. This should have been ideally requested before the commit and it 
was just an oversight about keeping both ticket and ML communication in sync.

Regards


From: Brandon Williams 
Sent: Thursday, November 10, 2022 13:20
To: dev@cassandra.apache.org
Subject: Re: Some tests are never executed in CI due to their name

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




+1 to treating as new unvetted tests.

Kind Regards,
Brandon

On Thu, Nov 10, 2022 at 5:41 AM Mick Semb Wever  wrote:
>
>
>
> On Thu, 10 Nov 2022 at 10:18, Miklosovic, Stefan 
>  wrote:
>>
>> I want to ask if people are comfortable with merging CASSANDRA-17964 before 
>> 4.1 is out. We clean a lot of things and rename tests which we should take 
>> care of. (like  PaxosRepairTest2).
>>
>> We might still fix it without merging 17964 as we know what should be fixed 
>> and we can wait until 4.1 is out but I personally do not see any reason for 
>> this.
>
>
>
> Unless these are runtime failures that will hurt users running 4.1, let's 
> treat them as new incomplete tests. So please let's not add new broken tests 
> right before the rc.


Re: [PROPOSAL] Add docker-based "Hello World" tutorial in-tree for new developers

2022-11-14 Thread Jon Haddad
I like this suggestion and I'd take it a step further.  I think the project 
should be publishing docker images.  If we manage to migrate to Gradle this 
would be pretty trivial as the jib Gradle plugin is excellent and makes 
creating and publishing Docker images really easy.

https://github.com/GoogleContainerTools/jib

I think there's an improvement over a local Dockerfile as we'd probably ditch 
that as soon as we started publishing our own Docker images. 

My 2 cents.  What do you think?

Jon

On 2022/11/10 22:14:15 Paulo Motta wrote:
> Hi everyone,
> 
> For the Grace Hopper Open Source Day mentoring hackathon [1] on Sept. 16 I
> created a short guide to handout to participants interested in contributing
> to Cassandra so they could get quickly get started working on LHF tickets.
> There were about 5 participants and most of them were able to quickly set
> up their environments and have a simple "Hello World" patch running in a
> local Cassandra container with the help of this guide.
> 
> While no hackathon participant got their patches committed, I considered it
> successful that most participants with no prior experience got started
> really fast and were able to have a look-and-feel of their patches running
> locally.
> 
> I would like to propose adding this docker-based quick start guide
> on CASSANDRA-18035 [2] and would like to hear your opinions and feedback. A
> preliminary patch is available if anyone is interested in reviewing it.
> 
> Even though we have extensive development instructions available on [3], I
> think these can be quite daunting for newcomers that just want to quickly
> hack a simple patch, so I think there is value in providing a more
> succinct and hands-on docker-based tutorial in-tree.
> 
> I think this guide will be particularly useful to new users that want to
> contribute non-distributed changes like vtables and configuration, since
> they can easily play around with their patches in a local container
> environment.
> 
> While I don't think anyone will oppose providing nice instructions to
> newcomers, a couple of contention points I can think of in this initiative
> are:
> a) Shipping a new QUICKSTART.md guide in-tree.
> b) Shipping a vanilla Dockerfile in-tree, for local testing purposes only.
> 
> Regarding a) if there's any objection to adding a new file, perhaps we
> could merge these instructions in the "CONTRIBUTING.md" guide, or
> alternatively add them to the website documentation.
> 
> Regarding b), while we already maintain a docker image in cassandra-builds
> [4], that is more targeted to automated testing. I don't expect a
> significant maintenance burden for this in-tree image since it's mostly
> targeted at manual local testing, but we should make sure we warn users
> that this Dockerfile has no guarantees and should not be used in production.
> 
> Let me know what do you think,
> 
> Paulo
> 
> [1] - https://ghc.anitab.org/programs-and-awards/open-source-day/
> [2] - https://issues.apache.org/jira/browse/CASSANDRA-18035
> [3] - https://cassandra.apache.org/_/development/index.html
> [4] - https://github.com/apache/cassandra-builds/tree/trunk/docker
> 


Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Josh McKenzie
> in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty vigorous 
kick. I wouldn't push back against this deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
> 
> I’m confused why we see *any* increase in sstable size - TTLs and deletion 
> times are already written as unsigned vints as offsets from an sstable epoch 
> for each value.
> 
> I would dig in more carefully to explore why you’re seeing this increase? For 
> the same data there should be no change to size on disk.
> 
> 
>> On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
>> A 2-3% increase in storage volume is roughly equivalent to giving up the 
>> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
>> levels. This regression could be very expensive for storage-bound use cases.
>> 
>> From the perspective of storage overhead, the unsigned int approach sounds 
>> preferable.
>> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi  
>>> wrote:
>>>  
>>> Hi all,
>>> 
>>> We have done some more research on c14227. The current patch for 
>>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>>> of int. This approach does not have a negative impact on memtable memory 
>>> usage, as C* controles the memory used by the Memtable, but based on our 
>>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>>> to 3%.
>>> 
>>> As a mitigation to this problem it is possible to encode 
>>> *localDeletionTime* as a vint. It results in a 1% improvement but might 
>>> cause additional computations during compaction or some other operations.
>>> 
>>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>>> nowInSecond would work for memtables but not for work in the SSTable where 
>>> nowInSecond does not exist. By consequence we would still suffer from the 
>>> impact on byte flushed and bytes on disk.
>>> 
>>> Another approach that was suggested is the use of unsigned integer. Java 8 
>>> has an unsigned integer API that would allow us to use unsigned int for 
>>> TTLs. Based on computation unsigned ints would give us a maximum time of 
>>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp 
>>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>>> 
>>> Happy to hear opinions.
>>> 
>>> On 18/10/22 10:56, Berenguer Blasi wrote:
 Hi,
 
 apologies for the late reply as I have been OOO. I have done some 
 profiling and results look virtually identical on trunk and 14227. I have 
 attached some screenshots to the ticket 
 https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
 fooling me everything in the jfrs look the same.
 
 Regards
 
 On 30/9/22 9:44, Berenguer Blasi wrote:
> Hi Benedict,
> 
> thanks for the reply! Yes some profiling is probably needed, then we can 
> see if going down the delta encoding big refactor rabbit hole is worth it?
> 
> Let's see what other concerns people bring up.
> 
> Thx.
> 
> On 29/9/22 11:12, Benedict Elliott Smith wrote:
>> My only slight concern with this approach is the additional memory 
>> pressure. Since 64yrs should be plenty at any moment in time, I wonder 
>> if it wouldn’t be better to represent these times as deltas from the 
>> nowInSec being used to process the query. So, long math would only be 
>> used to normalise the times to this nowInSec (from whatever is stored in 
>> the sstable) within a method, and ints would be stored in memtables and 
>> any objects used for processing. 
>> 
>> This might admittedly be more work, but I don’t believe it should be too 
>> challenging - we can introduce a method deletionTime(int nowInSec) that 
>> returns a long value by adding nowInSec to the deletionTime, and make 
>> the underlying value private, refactoring call sites?
>> 
>>> On 29 Sep 2022, at 09:37, Berenguer Blasi  
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>>> 
>>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>>> should get us past the 2038 limit.
>>> 
>>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>>> 'free' guardrail.
>>> 
>>> - A new NONE overflow policy is the default but everything is backwards 
>>> compatible by keeping the previous ones in place. Think upgrade 
>>> scenarios or apps relying on the previous behavior.
>>> 
>>> - The new limit is around year 292,471,208,677 which sounds ok given 
>>> the Sun will start collapsing in 3 to 5 billion years :-)
>>> 
>>> - Please feel free to drop by the ticket and take a look at the PR even 
>>> if it's cursory
>>> 

Re: CASSANDRA-14227 removing the 2038 limit

2022-11-14 Thread Benedict
I’m confused why we see *any* increase in sstable size - TTLs and deletion 
times are already written as unsigned vints as offsets from an sstable epoch 
for each value.

I would dig in more carefully to explore why you’re seeing this increase? For 
the same data there should be no change to size on disk.

> On 14 Nov 2022, at 06:36, C. Scott Andreas  wrote:
> 
> A 2-3% increase in storage volume is roughly equivalent to giving up the 
> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
> levels. This regression could be very expensive for storage-bound use cases.
> 
> From the perspective of storage overhead, the unsigned int approach sounds 
> preferable.
> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi  
>>> wrote:
>>> 
>> 
>> Hi all,
>> 
>> We have done some more research on c14227. The current patch for 
>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>> of int. This approach does not have a negative impact on memtable memory 
>> usage, as C* controles the memory used by the Memtable, but based on our 
>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>> to 3%.
>> 
>> As a mitigation to this problem it is possible to encode localDeletionTime 
>> as a vint. It results in a 1% improvement but might cause additional 
>> computations during compaction or some other operations.
>> 
>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>> nowInSecond would work for memtables but not for work in the SSTable where 
>> nowInSecond does not exist. By consequence we would still suffer from the 
>> impact on byte flushed and bytes on disk.
>> 
>> Another approach that was suggested is the use of unsigned integer. Java 8 
>> has an unsigned integer API that would allow us to use unsigned int for 
>> TTLs. Based on computation unsigned ints would give us a maximum time of 136 
>> years since the Unix Epoch and therefore a maximum expiration timestamp in 
>> 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>> 
>> Happy to hear opinions.
>> 
>> On 18/10/22 10:56, Berenguer Blasi wrote:
>>> Hi,
>>> 
>>> apologies for the late reply as I have been OOO. I have done some profiling 
>>> and results look virtually identical on trunk and 14227. I have attached 
>>> some screenshots to the ticket 
>>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
>>> fooling me everything in the jfrs look the same.
>>> 
>>> Regards
>>> 
>>> On 30/9/22 9:44, Berenguer Blasi wrote:
 Hi Benedict,
 
 thanks for the reply! Yes some profiling is probably needed, then we can 
 see if going down the delta encoding big refactor rabbit hole is worth it?
 
 Let's see what other concerns people bring up.
 
 Thx.
 
 On 29/9/22 11:12, Benedict Elliott Smith wrote:
> My only slight concern with this approach is the additional memory 
> pressure. Since 64yrs should be plenty at any moment in time, I wonder if 
> it wouldn’t be better to represent these times as deltas from the 
> nowInSec being used to process the query. So, long math would only be 
> used to normalise the times to this nowInSec (from whatever is stored in 
> the sstable) within a method, and ints would be stored in memtables and 
> any objects used for processing.
> 
> This might admittedly be more work, but I don’t believe it should be too 
> challenging - we can introduce a method deletionTime(int nowInSec) that 
> returns a long value by adding nowInSec to the deletionTime, and make the 
> underlying value private, refactoring call sites?
> 
>> On 29 Sep 2022, at 09:37, Berenguer Blasi  
>> wrote:
>> 
>> Hi all,
>> 
>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>> 
>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>> should get us past the 2038 limit.
>> 
>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>> 'free' guardrail.
>> 
>> - A new NONE overflow policy is the default but everything is backwards 
>> compatible by keeping the previous ones in place. Think upgrade 
>> scenarios or apps relying on the previous behavior.
>> 
>> - The new limit is around year 292,471,208,677 which sounds ok given the 
>> Sun will start collapsing in 3 to 5 billion years :-)
>> 
>> - Please feel free to drop by the ticket and take a look at the PR even 
>> if it's cursory
>> 
>> Thx in advance.
>> 
>