Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-08-17 Thread Benjamin Lerer
writesShouldSkipCommitLog is a result of scope reduction (call it
> > > laziness on my part). I could not find a way to tell if commit log
> > > data may be required for point-in-time-restore or any other feature,
> > > and the existing method of turning the commit log off does not have
> > > the right granularity. I am very open to suggestions here.
> > >
> >
> > Could this be limited to a single parameter? I'm not sure if the
> > "isDurable" + "shouldSkip" is interesting instead of "shouldWrite" (etc).
> > But I also wonder in cases where point-in-time restore is required how
> one
> > could achieve it without a commit log (can persistent memory memtable be
> > rolled back?). That does have an effect on backups. I have to read your
> > impl how you intended to rewrite the process from Keyspace (where the
> > requirement for "isDurable" starts from).
> >
> > Although I do feel like persistent memory exceptions make stuff more
> > complex.
> >
> >
> >
> > >
> > >
> > >
> > > > Why is streaming in the memtable? [...] the wanted behavior is just
> > >   disabling automated flushing
> > >
> > > Yes, if zero-copy-streaming is not enabled. And that's exactly what
> > > this method is there for -- to make sure sstables are not copied
> > > whole, and that a flush is not done at the end.
> > >
> > > Regards,
> > > Branimir
> > >
> > > On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org <
> bened...@apache.org
> > >
> > > wrote:
> > >
> > > > I would love to help out with this in any way that I can, FYI.
> > Definitely
> > > > one of the more impactful performance improvements to the codebase,
> > given
> > > > the benefits to compaction and memory behaviour.
> > > >
> > > > From: bened...@apache.org 
> > > > Date: Wednesday, 21 July 2021 at 14:32
> > > > To: dev@cassandra.apache.org 
> > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > > memtable-as-a-commitlog-index
> > > >
> > > > Heh, based on 7282? Yeah, I’ve had this idea for a while now
> (actually
> > > > there was a paper that did this a long time ago), and it could be
> very
> > > nice
> > > > (if for no other benefit than reducing heap utilisation). I don’t
> think
> > > > this requires that they be modelled as the same concept, however,
> only
> > > that
> > > > the Memtable must be able to receive an address into a commit log
> entry
> > > and
> > > > to adopt partial ownership over the entry’s lifecycle.
> > > >
> > > >
> > > > From: Branimir Lambov 
> > > > Date: Wednesday, 21 July 2021 at 14:28
> > > > To: dev@cassandra.apache.org 
> > > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > > In general, I think we need to make up our mind as to whether we
> > > >   consider the Memtable and CommitLog one logical entity [...], or
> > > >   whether we want to further untangle those two components from an
> > > >   architectural perspective which we started down that road on with
> > > >   the pluggable storage engine work.
> > > >
> > > > This CEP is intentionally not attempting to answer this question.
> FWIW
> > > > I do not see them as separable (there's evidence to this fact in the
> > > > codebase), but there are valid secondary uses of the commit log that
> > > > are served well enough by the current architecture.
> > > >
> > > > It is important, however, to let the memtable implementation opt out,
> > > > to permit it to provide its own solution for data persistence.
> > > >
> > > > We should revisit this in the future, especially if Benedict's shared
> > > > log facility and my plans for a memtable-as-a-commitlog-index
> > > > evolve.
> > > >
> > > > Regards,
> > > > Branimir
> > > >
> > > > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > It is nice to see these going forward (and a great use of CEP) so
> > > thanks
> > > > > for the proposal. I have my reservations regarding the linking of
> > > > memtable
> > > > &

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-23 Thread Branimir Lambov
a commit log (can persistent memory memtable be
> rolled back?). That does have an effect on backups. I have to read your
> impl how you intended to rewrite the process from Keyspace (where the
> requirement for "isDurable" starts from).
>
> Although I do feel like persistent memory exceptions make stuff more
> complex.
>
>
>
> >
> >
> >
> > > Why is streaming in the memtable? [...] the wanted behavior is just
> >   disabling automated flushing
> >
> > Yes, if zero-copy-streaming is not enabled. And that's exactly what
> > this method is there for -- to make sure sstables are not copied
> > whole, and that a flush is not done at the end.
> >
> > Regards,
> > Branimir
> >
> > On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org  >
> > wrote:
> >
> > > I would love to help out with this in any way that I can, FYI.
> Definitely
> > > one of the more impactful performance improvements to the codebase,
> given
> > > the benefits to compaction and memory behaviour.
> > >
> > > From: bened...@apache.org 
> > > Date: Wednesday, 21 July 2021 at 14:32
> > > To: dev@cassandra.apache.org 
> > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > memtable-as-a-commitlog-index
> > >
> > > Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually
> > > there was a paper that did this a long time ago), and it could be very
> > nice
> > > (if for no other benefit than reducing heap utilisation). I don’t think
> > > this requires that they be modelled as the same concept, however, only
> > that
> > > the Memtable must be able to receive an address into a commit log entry
> > and
> > > to adopt partial ownership over the entry’s lifecycle.
> > >
> > >
> > > From: Branimir Lambov 
> > > Date: Wednesday, 21 July 2021 at 14:28
> > > To: dev@cassandra.apache.org 
> > > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > > In general, I think we need to make up our mind as to whether we
> > >   consider the Memtable and CommitLog one logical entity [...], or
> > >   whether we want to further untangle those two components from an
> > >   architectural perspective which we started down that road on with
> > >   the pluggable storage engine work.
> > >
> > > This CEP is intentionally not attempting to answer this question. FWIW
> > > I do not see them as separable (there's evidence to this fact in the
> > > codebase), but there are valid secondary uses of the commit log that
> > > are served well enough by the current architecture.
> > >
> > > It is important, however, to let the memtable implementation opt out,
> > > to permit it to provide its own solution for data persistence.
> > >
> > > We should revisit this in the future, especially if Benedict's shared
> > > log facility and my plans for a memtable-as-a-commitlog-index
> > > evolve.
> > >
> > > Regards,
> > > Branimir
> > >
> > > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:
> > >
> > > > Hi,
> > > >
> > > > It is nice to see these going forward (and a great use of CEP) so
> > thanks
> > > > for the proposal. I have my reservations regarding the linking of
> > > memtable
> > > > to CommitLog and flushing and should not leak abstraction from one to
> > > > another. And I don't see the reasoning why they should be, it doesn't
> > > seem
> > > > to add anything else than tight coupling of components, reducing
> reuse
> > > and
> > > > making things unnecessarily complicated. Also, the streaming notions
> > seem
> > > > weird to me - how are they related to memtable? Why should memtable
> > care
> > > > about the behavior outside memtable's responsibility?
> > > >
> > > > Some misc (with some thoughts split / duplicated to different parts)
> > > quotes
> > > > and comments:
> > > >
> > > > > Tight coupling between CFS and memtable will be reduced: flushing
> > > > functionality is to be extracted, controlling memtable memory and
> > period
> > > > expiration will be handled by the memtable.
> > > >
> > > > Why is flushing control bad to do in CFS and better in the memtable?
> > > Doing
> > > > it outside memtable would allow to control the flu

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Branimir Lambov
> Why is flushing control bad to do in CFS and better in the
  memtable?

I wonder why you would understand this as something that takes away
control instead of giving it. The CFS is not configurable. With the
CEP, memtables are configurable at the table level. It is entirely
possible to implement a memtable wrapper that provides any of the
examples of functionalities you mention -- and that would be fully
configurable (just as example, one could very well select a
time-series-optimized-flush wrapper over skip-list memtable).



> is this proposal going to take an angle towards per-range
  memtables?

This is another question that the proposal leaves to the memtable
implementation (or wrapper), but it does make sense to make sure the
interfaces provide the necessary support for sharding (e.g. by
providing suitable shard boundaries that split the owned space; note
that we already have sstable/compaction-per-range functionality with
multiple data directories and it makes sense to ensure that the
provided splits are in some agreement with the data directory
boundaries).



> Why would the write process of the table not ask the table what
  settings it has and instead asks the memtable what settings the
  table has?

The reason for this is that memtables are the primary reason the
commit log needs to preserve data. The question of whether ot not the
memtable needs its content to be present and retained in the commit
log until flush (writesAreDurable) is a question that only the
memtable can answer.

writesShouldSkipCommitLog is a result of scope reduction (call it
laziness on my part). I could not find a way to tell if commit log
data may be required for point-in-time-restore or any other feature,
and the existing method of turning the commit log off does not have
the right granularity. I am very open to suggestions here.



> Why is streaming in the memtable? [...] the wanted behavior is just
  disabling automated flushing

Yes, if zero-copy-streaming is not enabled. And that's exactly what
this method is there for -- to make sure sstables are not copied
whole, and that a flush is not done at the end.

Regards,
Branimir

On Wed, Jul 21, 2021 at 4:33 PM bened...@apache.org 
wrote:

> I would love to help out with this in any way that I can, FYI. Definitely
> one of the more impactful performance improvements to the codebase, given
> the benefits to compaction and memory behaviour.
>
> From: bened...@apache.org 
> Date: Wednesday, 21 July 2021 at 14:32
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > memtable-as-a-commitlog-index
>
> Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually
> there was a paper that did this a long time ago), and it could be very nice
> (if for no other benefit than reducing heap utilisation). I don’t think
> this requires that they be modelled as the same concept, however, only that
> the Memtable must be able to receive an address into a commit log entry and
> to adopt partial ownership over the entry’s lifecycle.
>
>
> From: Branimir Lambov 
> Date: Wednesday, 21 July 2021 at 14:28
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > In general, I think we need to make up our mind as to whether we
>   consider the Memtable and CommitLog one logical entity [...], or
>   whether we want to further untangle those two components from an
>   architectural perspective which we started down that road on with
>   the pluggable storage engine work.
>
> This CEP is intentionally not attempting to answer this question. FWIW
> I do not see them as separable (there's evidence to this fact in the
> codebase), but there are valid secondary uses of the commit log that
> are served well enough by the current architecture.
>
> It is important, however, to let the memtable implementation opt out,
> to permit it to provide its own solution for data persistence.
>
> We should revisit this in the future, especially if Benedict's shared
> log facility and my plans for a memtable-as-a-commitlog-index
> evolve.
>
> Regards,
> Branimir
>
> On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:
>
> > Hi,
> >
> > It is nice to see these going forward (and a great use of CEP) so thanks
> > for the proposal. I have my reservations regarding the linking of
> memtable
> > to CommitLog and flushing and should not leak abstraction from one to
> > another. And I don't see the reasoning why they should be, it doesn't
> seem
> > to add anything else than tight coupling of components, reducing reuse
> and
> > making things unnecessarily complicated. Also, the streaming notions seem
> > weird to me - how are they related to memtable? Why should memtable care
> > about the b

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
I would love to help out with this in any way that I can, FYI. Definitely one 
of the more impactful performance improvements to the codebase, given the 
benefits to compaction and memory behaviour.

From: bened...@apache.org 
Date: Wednesday, 21 July 2021 at 14:32
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
>

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread bened...@apache.org
> memtable-as-a-commitlog-index

Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually there 
was a paper that did this a long time ago), and it could be very nice (if for 
no other benefit than reducing heap utilisation). I don’t think this requires 
that they be modelled as the same concept, however, only that the Memtable must 
be able to receive an address into a commit log entry and to adopt partial 
ownership over the entry’s lifecycle.


From: Branimir Lambov 
Date: Wednesday, 21 July 2021 at 14:28
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> >

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Branimir Lambov
> In general, I think we need to make up our mind as to whether we
  consider the Memtable and CommitLog one logical entity [...], or
  whether we want to further untangle those two components from an
  architectural perspective which we started down that road on with
  the pluggable storage engine work.

This CEP is intentionally not attempting to answer this question. FWIW
I do not see them as separable (there's evidence to this fact in the
codebase), but there are valid secondary uses of the commit log that
are served well enough by the current architecture.

It is important, however, to let the memtable implementation opt out,
to permit it to provide its own solution for data persistence.

We should revisit this in the future, especially if Benedict's shared
log facility and my plans for a memtable-as-a-commitlog-index
evolve.

Regards,
Branimir

On Wed, Jul 21, 2021 at 1:34 PM Michael Burman  wrote:

> Hi,
>
> It is nice to see these going forward (and a great use of CEP) so thanks
> for the proposal. I have my reservations regarding the linking of memtable
> to CommitLog and flushing and should not leak abstraction from one to
> another. And I don't see the reasoning why they should be, it doesn't seem
> to add anything else than tight coupling of components, reducing reuse and
> making things unnecessarily complicated. Also, the streaming notions seem
> weird to me - how are they related to memtable? Why should memtable care
> about the behavior outside memtable's responsibility?
>
> Some misc (with some thoughts split / duplicated to different parts) quotes
> and comments:
>
> > Tight coupling between CFS and memtable will be reduced: flushing
> functionality is to be extracted, controlling memtable memory and period
> expiration will be handled by the memtable.
>
> Why is flushing control bad to do in CFS and better in the memtable? Doing
> it outside memtable would allow to control the flushing regardless of how
> the actual memtable is implemented. For example, lets say someone would
> want to implement the HBase's accordion to Cassandra. It shouldn't matter
> what the implementation of memtable is as the compaction of different
> memtables could be beneficial to all implementations. Or the flushing would
> push the memtable to a proper caching instead of only to disk.
>
> Or if we had per table caching structure, we could control the flushing of
> memtables and the cache structure separately. Some data benefits from LRU
> and some from MRW (most-recently-written) caching strategies. But both
> could benefit from the same memtable implementation, it's the data and how
> its used that could control how the flushing should work. For example time
> series data behaves quite differently in terms of data accesses to
> something more "random".
>
> Or even "total memory control" which would check which tables need more
> memory to do their writes and which do not. Or that the memory doesn't grow
> over a boundary and needs to manually maintain how much is dedicated to
> caching and how much to memtables waiting to be flushed. Or delay flushing
> because the disks can't keep up etc. Not to be implemented in this CEP, but
> pushing this strategy to memtable would prevent many features.
>
> > Beyond thread-safety, the concurrency constraints of the memtable are
> intentionally left unspecified.
>
> I like this. I could see use-cases where a single-thread implementation
> could actually outperform some concurrent data structures. But it also
> provides me with a question, is this proposal going to take an angle
> towards per-range memtables? There are certainly benefits to splitting the
> memtables as it would reduce the "n" in the operations, thus providing less
> overhead in lookups and writes. Although, taking it one step backwards I
> could see the benefit of having a commitlog per range also, which would
> allow higher utilization of NVME drives with larger queue depths. And why
> not per-range-sstables for faster scale-outs and .. a bit outside the scope
> of CEP, but just to ensure that the implementation does not block such
> improvement.
>
> Interfaces:
>
> > boolean writesAreDurable()
> > boolean writesShouldSkipCommitLog()
>
> The placement inside memtable implementation for these methods just feels
> incredibly wrong to me. The writing pipeline should have these configured
> and they could differ for each table even with the same memtable
> implementation. Lets take the example of an in-memory memtable use case
> that's never written to a SSTable. We could have one table with just simply
> in-memory cached storage and another one with a Redis style persistence of
> AOF, where writes would be written to the commitlog for fast recovery, but
> the data is otherwise always only kept in the memtable instead of writing
> to the SSTable (for performance reasons). Same implementation of memtable
> still.
>
> Why would the write process of the table not ask the table what settings it
> has and instead 

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-21 Thread Michael Burman
Hi,

It is nice to see these going forward (and a great use of CEP) so thanks
for the proposal. I have my reservations regarding the linking of memtable
to CommitLog and flushing and should not leak abstraction from one to
another. And I don't see the reasoning why they should be, it doesn't seem
to add anything else than tight coupling of components, reducing reuse and
making things unnecessarily complicated. Also, the streaming notions seem
weird to me - how are they related to memtable? Why should memtable care
about the behavior outside memtable's responsibility?

Some misc (with some thoughts split / duplicated to different parts) quotes
and comments:

> Tight coupling between CFS and memtable will be reduced: flushing
functionality is to be extracted, controlling memtable memory and period
expiration will be handled by the memtable.

Why is flushing control bad to do in CFS and better in the memtable? Doing
it outside memtable would allow to control the flushing regardless of how
the actual memtable is implemented. For example, lets say someone would
want to implement the HBase's accordion to Cassandra. It shouldn't matter
what the implementation of memtable is as the compaction of different
memtables could be beneficial to all implementations. Or the flushing would
push the memtable to a proper caching instead of only to disk.

Or if we had per table caching structure, we could control the flushing of
memtables and the cache structure separately. Some data benefits from LRU
and some from MRW (most-recently-written) caching strategies. But both
could benefit from the same memtable implementation, it's the data and how
its used that could control how the flushing should work. For example time
series data behaves quite differently in terms of data accesses to
something more "random".

Or even "total memory control" which would check which tables need more
memory to do their writes and which do not. Or that the memory doesn't grow
over a boundary and needs to manually maintain how much is dedicated to
caching and how much to memtables waiting to be flushed. Or delay flushing
because the disks can't keep up etc. Not to be implemented in this CEP, but
pushing this strategy to memtable would prevent many features.

> Beyond thread-safety, the concurrency constraints of the memtable are
intentionally left unspecified.

I like this. I could see use-cases where a single-thread implementation
could actually outperform some concurrent data structures. But it also
provides me with a question, is this proposal going to take an angle
towards per-range memtables? There are certainly benefits to splitting the
memtables as it would reduce the "n" in the operations, thus providing less
overhead in lookups and writes. Although, taking it one step backwards I
could see the benefit of having a commitlog per range also, which would
allow higher utilization of NVME drives with larger queue depths. And why
not per-range-sstables for faster scale-outs and .. a bit outside the scope
of CEP, but just to ensure that the implementation does not block such
improvement.

Interfaces:

> boolean writesAreDurable()
> boolean writesShouldSkipCommitLog()

The placement inside memtable implementation for these methods just feels
incredibly wrong to me. The writing pipeline should have these configured
and they could differ for each table even with the same memtable
implementation. Lets take the example of an in-memory memtable use case
that's never written to a SSTable. We could have one table with just simply
in-memory cached storage and another one with a Redis style persistence of
AOF, where writes would be written to the commitlog for fast recovery, but
the data is otherwise always only kept in the memtable instead of writing
to the SSTable (for performance reasons). Same implementation of memtable
still.

Why would the write process of the table not ask the table what settings it
has and instead asks the memtable what settings the table has? This seems
counterintuitive to me. Even the persistent memory case is a bit
questionable, why not simply disable commitlog in the writing process? Why
ask the memtable?

This feels like memtable is going to be the write pipeline, but to me that
doesn't feel like the correct architectural decision. I'd rather see these
decisions done outside the memtable. Even a persistent memory memtable user
might want to have a commitlog enabled for data capture / shipping logs, or
layers of persistence speed. The whole persistent memory without any
commercially known future is a bit weird at the moment (even Optane has no
known manufacturing anymore with last factory being dismantled based on
public information).

> boolean streamToMemtable()

And that one I don't understand. Why is streaming in the memtable? This
smells like a scope creep from something else. The explanation would
indicate to me that the wanted behavior is just disabling automated
flushing.

But these are just some questions that came to 

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Berenguer Blasi
+1. De-tangling, going more modular and clean interfaces sgtm.

On 20/7/21 21:45, Nate McCall wrote:
> Yay for pluggable memtables!! I havent gone over this in detail yet, but
> personally I've always thought integrating something like Arrow would be
> cool for sharing data (that's as far as i've gotten, but anything that
> makes that kind of experimentation easier would also help with mocking test
> plumbing, so +1 from me).
>
> Thanks for putting this together!
>
> -Nate
>
> On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> branimir.lam...@datastax.com> wrote:
>
>> Proposal for a mechanism for plugging in memtable implementations:
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>>
>> The proposal supports using custom memtable implementations to support
>> development and testing of improved alternatives, but also enables a
>> broader definition of "memtable" to better support more advanced use cases
>> like persistent memory. To this end, memtable implementations are given
>> control over flushing and storing data in the commit log, enabling
>> solutions that implement their own durability mechanisms and live much
>> longer than their classical counterparts. Taken to the extreme, this also
>> enables memtables that never flush (in other words, alternative storage
>> engines) in a minimally-invasive manner.
>>
>> I am curious to hear your thoughts on the proposal.
>>
>> Regards,
>> Branimir
>>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Nate McCall
Yay for pluggable memtables!! I havent gone over this in detail yet, but
personally I've always thought integrating something like Arrow would be
cool for sharing data (that's as far as i've gotten, but anything that
makes that kind of experimentation easier would also help with mocking test
plumbing, so +1 from me).

Thanks for putting this together!

-Nate

On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
branimir.lam...@datastax.com> wrote:

> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Jeremiah D Jordan
+1 from me.  I like the direction many of these proposals are going to clean 
up/add internal interfaces along with the new features proposed.

-Jeremiah

> On Jul 20, 2021, at 1:27 PM, bened...@apache.org wrote:
> 
> I think it would be a mistake to combine the Memtable with CommitLog; several 
> systems use CommitLog-like functionality, and in the medium term I think 
> these would benefit from a unified system, that Memtables may opt to register 
> with.  It might make sense to give the Memtable the choice over whether a 
> Memtable write is persisted to this shared facility, but that’s different 
> from merging the two conceptually.
> 
> I may look into producing a CEP on this evolution sometime in the next few 
> months, but just a heads up about my thoughts on the topic, and to reach out 
> if you plan your own evolution of this stuff.
> 
> From: Joshua McKenzie 
> Date: Tuesday, 20 July 2021 at 18:36
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> +1 to the idea.
> 
> In general, I think we need to make up our mind as to whether we consider
> the Memtable and CommitLog one logical entity (As stated in the CEP:
> "Conceptually
> these two pieces of the storage engine form one component — the LSM buffer
> of Cassandra, and as such it makes a lot of sense to bundle them together. "),
> or whether we want to further untangle those two components from an
> architectural perspective which we started down that road on with the
> pluggable storage engine work.
> 
> The interface as drafted codifies the idea that a Memtable should have an
> opinion about how a CommitLog does its business (default boolean
> writesShouldSkipCommitLog()) which makes sense if our design goal is to
> keep those two things interdependent. I advocate for further separating
> them but suspect that's a debate better had on JIRA or slack than the CEP
> thread, just figured I'd bring it up since it's not yet clear to me whether
> that's a pre or post CEP discussion (specific details of interfaces, etc).
> 
> Lots of quality work obviously went into this from a bunch of folks -
> thanks Branimir!
> 
> ~Josh
> 
> 
> 
> 
> On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
> wrote:
> 
>> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
>> very much in favour of the work to support this, and the introduction of
>> the newly proposed implementations.
>> 
>> In particular, really happy to see somebody finally finish up C-7282! I
>> look forward to seeing how the different approaches compare.
>> 
>> 
>> From: Branimir Lambov 
>> Date: Tuesday, 20 July 2021 at 11:11
>> To: dev@cassandra.apache.org 
>> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
>> Proposal for a mechanism for plugging in memtable implementations:
>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>> 
>> The proposal supports using custom memtable implementations to support
>> development and testing of improved alternatives, but also enables a
>> broader definition of "memtable" to better support more advanced use cases
>> like persistent memory. To this end, memtable implementations are given
>> control over flushing and storing data in the commit log, enabling
>> solutions that implement their own durability mechanisms and live much
>> longer than their classical counterparts. Taken to the extreme, this also
>> enables memtables that never flush (in other words, alternative storage
>> engines) in a minimally-invasive manner.
>> 
>> I am curious to hear your thoughts on the proposal.
>> 
>> Regards,
>> Branimir
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
I think it would be a mistake to combine the Memtable with CommitLog; several 
systems use CommitLog-like functionality, and in the medium term I think these 
would benefit from a unified system, that Memtables may opt to register with.  
It might make sense to give the Memtable the choice over whether a Memtable 
write is persisted to this shared facility, but that’s different from merging 
the two conceptually.

I may look into producing a CEP on this evolution sometime in the next few 
months, but just a heads up about my thoughts on the topic, and to reach out if 
you plan your own evolution of this stuff.

From: Joshua McKenzie 
Date: Tuesday, 20 July 2021 at 18:36
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
+1 to the idea.

In general, I think we need to make up our mind as to whether we consider
the Memtable and CommitLog one logical entity (As stated in the CEP:
"Conceptually
these two pieces of the storage engine form one component — the LSM buffer
of Cassandra, and as such it makes a lot of sense to bundle them together. "),
or whether we want to further untangle those two components from an
architectural perspective which we started down that road on with the
pluggable storage engine work.

The interface as drafted codifies the idea that a Memtable should have an
opinion about how a CommitLog does its business (default boolean
writesShouldSkipCommitLog()) which makes sense if our design goal is to
keep those two things interdependent. I advocate for further separating
them but suspect that's a debate better had on JIRA or slack than the CEP
thread, just figured I'd bring it up since it's not yet clear to me whether
that's a pre or post CEP discussion (specific details of interfaces, etc).

Lots of quality work obviously went into this from a bunch of folks -
thanks Branimir!

~Josh




On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
wrote:

> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
> very much in favour of the work to support this, and the introduction of
> the newly proposed implementations.
>
> In particular, really happy to see somebody finally finish up C-7282! I
> look forward to seeing how the different approaches compare.
>
>
> From: Branimir Lambov 
> Date: Tuesday, 20 July 2021 at 11:11
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Joshua McKenzie
+1 to the idea.

In general, I think we need to make up our mind as to whether we consider
the Memtable and CommitLog one logical entity (As stated in the CEP:
"Conceptually
these two pieces of the storage engine form one component — the LSM buffer
of Cassandra, and as such it makes a lot of sense to bundle them together. "),
or whether we want to further untangle those two components from an
architectural perspective which we started down that road on with the
pluggable storage engine work.

The interface as drafted codifies the idea that a Memtable should have an
opinion about how a CommitLog does its business (default boolean
writesShouldSkipCommitLog()) which makes sense if our design goal is to
keep those two things interdependent. I advocate for further separating
them but suspect that's a debate better had on JIRA or slack than the CEP
thread, just figured I'd bring it up since it's not yet clear to me whether
that's a pre or post CEP discussion (specific details of interfaces, etc).

Lots of quality work obviously went into this from a bunch of folks -
thanks Branimir!

~Josh




On Tue, Jul 20, 2021 at 6:20 AM bened...@apache.org 
wrote:

> +1. I haven’t looked in detail at the API that’s been proposed, but I’m
> very much in favour of the work to support this, and the introduction of
> the newly proposed implementations.
>
> In particular, really happy to see somebody finally finish up C-7282! I
> look forward to seeing how the different approaches compare.
>
>
> From: Branimir Lambov 
> Date: Tuesday, 20 July 2021 at 11:11
> To: dev@cassandra.apache.org 
> Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
> Proposal for a mechanism for plugging in memtable implementations:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
>
> The proposal supports using custom memtable implementations to support
> development and testing of improved alternatives, but also enables a
> broader definition of "memtable" to better support more advanced use cases
> like persistent memory. To this end, memtable implementations are given
> control over flushing and storing data in the commit log, enabling
> solutions that implement their own durability mechanisms and live much
> longer than their classical counterparts. Taken to the extreme, this also
> enables memtables that never flush (in other words, alternative storage
> engines) in a minimally-invasive manner.
>
> I am curious to hear your thoughts on the proposal.
>
> Regards,
> Branimir
>


Re: [DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread bened...@apache.org
+1. I haven’t looked in detail at the API that’s been proposed, but I’m very 
much in favour of the work to support this, and the introduction of the newly 
proposed implementations.

In particular, really happy to see somebody finally finish up C-7282! I look 
forward to seeing how the different approaches compare.


From: Branimir Lambov 
Date: Tuesday, 20 July 2021 at 11:11
To: dev@cassandra.apache.org 
Subject: [DISCUSS] CEP-11: Pluggable memtable implementations
Proposal for a mechanism for plugging in memtable implementations:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations

The proposal supports using custom memtable implementations to support
development and testing of improved alternatives, but also enables a
broader definition of "memtable" to better support more advanced use cases
like persistent memory. To this end, memtable implementations are given
control over flushing and storing data in the commit log, enabling
solutions that implement their own durability mechanisms and live much
longer than their classical counterparts. Taken to the extreme, this also
enables memtables that never flush (in other words, alternative storage
engines) in a minimally-invasive manner.

I am curious to hear your thoughts on the proposal.

Regards,
Branimir


[DISCUSS] CEP-11: Pluggable memtable implementations

2021-07-20 Thread Branimir Lambov
Proposal for a mechanism for plugging in memtable implementations:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations

The proposal supports using custom memtable implementations to support
development and testing of improved alternatives, but also enables a
broader definition of "memtable" to better support more advanced use cases
like persistent memory. To this end, memtable implementations are given
control over flushing and storing data in the commit log, enabling
solutions that implement their own durability mechanisms and live much
longer than their classical counterparts. Taken to the extreme, this also
enables memtables that never flush (in other words, alternative storage
engines) in a minimally-invasive manner.

I am curious to hear your thoughts on the proposal.

Regards,
Branimir