Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Berenguer Blasi
+1. De-tangling, going more modular and clean interfaces sgtm.

On 20/7/21 21:45, Nate McCall wrote:
> Yay for pluggable memtables!! I havent gone over this in detail yet, but
> personally I've always thought integrating something like Arrow would be
> cool for sharing data (that's as far as i've gotten, but anything that
> makes that kind of experimentation easier would also help with mocking test
> plumbing, so +1 from me).
>
> Thanks for putting this together!
>
> -Nate
>
> On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> branimir.lam...@datastax.com> wrote:
>
>> Proposal for a mechanism for plugging in memtable implementations:
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>>
>> The proposal supports using custom memtable implementations to support
>> development and testing of improved alternatives, but also enables a
>> broader definition of "memtable" to better support more advanced use cases
>> like persistent memory. To this end, memtable implementations are given
>> control over flushing and storing data in the commit log, enabling
>> solutions that implement their own durability mechanisms and live much
>> longer than their classical counterparts. Taken to the extreme, this also
>> enables memtables that never flush (in other words, alternative storage
>> engines) in a minimally-invasive manner.
>>
>> I am curious to hear your thoughts on the proposal.
>>
>> Regards,
>> Branimir
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Nate McCall
Yay for pluggable memtables!! I havent gone over this in detail yet, but
personally I've always thought integrating something like Arrow would be
cool for sharing data (that's as far as i've gotten, but anything that
makes that kind of experimentation easier would also help with mocking test
plumbing, so +1 from me).

Thanks for putting this together!

-Nate

On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
branimir.lam...@datastax.com> wrote:

> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Jeremiah D Jordan
+1 from me.  I like the direction many of these proposals are going to clean 
up/add internal interfaces along with the new features proposed.

-Jeremiah

> On Jul 20, 2021, at 1:27 PM, bened...@apache.org wrote:
> 
> I think it would be a mistake to combine the Memtable with CommitLog; several 
> systems use CommitLog-like functionality, and in the medium term I think 
> these would benefit from a unified system, that Memtables may opt to register 
> with.  It might make sense to give the Memtable the choice over whether a 
> Memtable write is persisted to this shared facility, but that’s different 
> from merging the two conceptually.
> 
> I may look into producing a CEP on this evolution sometime in the next few 
> months, but just a heads up about my thoughts on the topic, and to reach out 
> if you plan your own evolution of this stuff.
> 
> From: Joshua McKenzie 
> Date: Tuesday, 20 July 2021 at 18:36
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> +1 to the idea.
> 
> In general, I think we need to make up our mind as to whether we consider
> the Memtable and CommitLog one logical entity (As stated in the CEP:
> "Conceptually
> these two pieces of the storage engine form one component — the LSM buffer
> of Cassandra, and as such it makes a lot of sense to bundle them together. "),
> or whether we want to further untangle those two components from an
> architectural perspective which we started down that road on with the
> pluggable storage engine work.
> 
> The interface as drafted codifies the idea that a Memtable should have an
> opinion about how a CommitLog does its business (default boolean
> writesShouldSkipCommitLog()) which makes sense if our design goal is to
> keep those two things interdependent. I advocate for further separating
> them but suspect that's a debate better had on JIRA or slack than the CEP
> thread, just figured I'd bring it up since it's not yet clear to me whether
> that's a pre or post CEP discussion (specific details of interfaces, etc).
> 
> Lots of quality work obviously went into this from a bunch of folks -
> thanks Branimir!
> 
> ~Josh
> 
> 
> 
> 
> On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
> wrote:
> 
>> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
>> very much in favour of the work to support this, and the introduction of
>> the newly proposed implementations.
>> 
>> In particular, really happy to see somebody finally finish up C-7282! I
>> look forward to seeing how the different approaches compare.
>> 
>> 
>> From: Branimir Lambov 
>> Date: Tuesday, 20 July 2021 at 11:11
>> To: dev@cassandra.apache.org 
>> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
>> Proposal for a mechanism for plugging in memtable implementations:
>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>> 
>> The proposal supports using custom memtable implementations to support
>> development and testing of improved alternatives, but also enables a
>> broader definition of "memtable" to better support more advanced use cases
>> like persistent memory. To this end, memtable implementations are given
>> control over flushing and storing data in the commit log, enabling
>> solutions that implement their own durability mechanisms and live much
>> longer than their classical counterparts. Taken to the extreme, this also
>> enables memtables that never flush (in other words, alternative storage
>> engines) in a minimally-invasive manner.
>> 
>> I am curious to hear your thoughts on the proposal.
>> 
>> Regards,
>> Branimir
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-10: Cluster and Code Simulations

2021-07-20 Thread Jeremiah D Jordan
+1 from me for the proposal ignoring the "where it goes".  I think the 
refactors proposed in it make sense no matter what, and the simulation ability 
should provide some very much needed testability improvements.

In particular replacing File with Path is something we have been looking to do 
(and were planning to bring up as a CEP in the coming months), as it gives a 
much better ability to plugin alternate file system access code.  We had 
someone do a POC internally at one point showing you could do fun things like 
access files in Google Cloud buckets directly from sstableloader with such a 
change (https://github.com/googleapis/java-storage-nio 
).

-Jeremiah


> On Jul 15, 2021, at 8:21 AM, Benjamin Lerer  wrote:
> 
> Does anybody have some other concerns than the target date?
> If not, I believe that we can start a vote tomorrow.
> 
> Le mer. 14 juil. 2021 à 23:18, Nate McCall  a écrit :
> 
>>> 
>>> 
>>> 
 Yes, we should perhaps remove target version from the template, and
 introduce guidance on describing stability impact etc.
>>> 
>>> Strong +1 to remove this from the template. I got sucked into the mistake
>>> of conflating implementation details and implications on where it lands
>>> instead of staying high level in the "do we agree we need this".
>>> 
>>> And I'm a +1 on the "I agree we need this".
>>> 
>> 
>> +1 to focusing on the _if_ (I think we need it).
>> 
>> IMO we could keep the target version in the template and allow "To Be
>> Decided (TBD)" as it could be useful for larger efforts or specific
>> features. (I don't want to bikeshed on that though and won't complain if
>> that field goes away.)
>> 
>> Appreciate the debate and refocusing, though!
>> 



Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
I think it would be a mistake to combine the Memtable with CommitLog; several 
systems use CommitLog-like functionality, and in the medium term I think these 
would benefit from a unified system, that Memtables may opt to register with.  
It might make sense to give the Memtable the choice over whether a Memtable 
write is persisted to this shared facility, but that’s different from merging 
the two conceptually.

I may look into producing a CEP on this evolution sometime in the next few 
months, but just a heads up about my thoughts on the topic, and to reach out if 
you plan your own evolution of this stuff.

From: Joshua McKenzie 
Date: Tuesday, 20 July 2021 at 18:36
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
+1 to the idea.

In general, I think we need to make up our mind as to whether we consider
the Memtable and CommitLog one logical entity (As stated in the CEP:
"Conceptually
these two pieces of the storage engine form one component — the LSM buffer
of Cassandra, and as such it makes a lot of sense to bundle them together. "),
or whether we want to further untangle those two components from an
architectural perspective which we started down that road on with the
pluggable storage engine work.

The interface as drafted codifies the idea that a Memtable should have an
opinion about how a CommitLog does its business (default boolean
writesShouldSkipCommitLog()) which makes sense if our design goal is to
keep those two things interdependent. I advocate for further separating
them but suspect that's a debate better had on JIRA or slack than the CEP
thread, just figured I'd bring it up since it's not yet clear to me whether
that's a pre or post CEP discussion (specific details of interfaces, etc).

Lots of quality work obviously went into this from a bunch of folks -
thanks Branimir!

~Josh




On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
wrote:

> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
> very much in favour of the work to support this, and the introduction of
> the newly proposed implementations.
>
> In particular, really happy to see somebody finally finish up C-7282! I
> look forward to seeing how the different approaches compare.
>
>
> From: Branimir Lambov 
> Date: Tuesday, 20 July 2021 at 11:11
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Joshua McKenzie
+1 to the idea.

In general, I think we need to make up our mind as to whether we consider
the Memtable and CommitLog one logical entity (As stated in the CEP:
"Conceptually
these two pieces of the storage engine form one component — the LSM buffer
of Cassandra, and as such it makes a lot of sense to bundle them together. "),
or whether we want to further untangle those two components from an
architectural perspective which we started down that road on with the
pluggable storage engine work.

The interface as drafted codifies the idea that a Memtable should have an
opinion about how a CommitLog does its business (default boolean
writesShouldSkipCommitLog()) which makes sense if our design goal is to
keep those two things interdependent. I advocate for further separating
them but suspect that's a debate better had on JIRA or slack than the CEP
thread, just figured I'd bring it up since it's not yet clear to me whether
that's a pre or post CEP discussion (specific details of interfaces, etc).

Lots of quality work obviously went into this from a bunch of folks -
thanks Branimir!

~Josh




On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
wrote:

> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
> very much in favour of the work to support this, and the introduction of
> the newly proposed implementations.
>
> In particular, really happy to see somebody finally finish up C-7282! I
> look forward to seeing how the different approaches compare.
>
>
> From: Branimir Lambov 
> Date: Tuesday, 20 July 2021 at 11:11
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
+1. I haven’t looked in detail at the API that’s been proposed, but I’m very 
much in favour of the work to support this, and the introduction of the newly 
proposed implementations.

In particular, really happy to see somebody finally finish up C-7282! I look 
forward to seeing how the different approaches compare.


From: Branimir Lambov 
Date: Tuesday, 20 July 2021 at 11:11
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
Proposal for a mechanism for plugging in memtable implementations:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations

The proposal supports using custom memtable implementations to support
development and testing of improved alternatives, but also enables a
broader definition of "memtable" to better support more advanced use cases
like persistent memory. To this end, memtable implementations are given
control over flushing and storing data in the commit log, enabling
solutions that implement their own durability mechanisms and live much
longer than their classical counterparts. Taken to the extreme, this also
enables memtables that never flush (in other words, alternative storage
engines) in a minimally-invasive manner.

I am curious to hear your thoughts on the proposal.

Regards,
Branimir


[DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Branimir Lambov
Proposal for a mechanism for plugging in memtable implementations:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations

The proposal supports using custom memtable implementations to support
development and testing of improved alternatives, but also enables a
broader definition of "memtable" to better support more advanced use cases
like persistent memory. To this end, memtable implementations are given
control over flushing and storing data in the commit log, enabling
solutions that implement their own durability mechanisms and live much
longer than their classical counterparts. Taken to the extreme, this also
enables memtables that never flush (in other words, alternative storage
engines) in a minimally-invasive manner.

I am curious to hear your thoughts on the proposal.

Regards,
Branimir