Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-15 Thread Branimir Lambov
Looks like the discussion is settled down. I am moving forward to putting
this proposal to a vote.

Regards,
Branimir

On Mon, Nov 15, 2021 at 7:28 PM David Capwell 
wrote:

> Works for me
>
> > On Nov 15, 2021, at 4:21 AM, Jacek Lewandowski <
> [email protected]> wrote:
> >
> > I'd put it another way - the scope is to make it possible to provide a
> new
> > implementation of sstable format without the necessity to refactor
> > Cassandra code. It implies a contract about the responsibilities of
> sstable
> > format implementation so that the rest of the code can rely on that, and
> > only on that, and do not make assumptions beyond that. But it does not
> > claim that the created interfaces will not change even with a minor
> version
> > release. When those interfaces are around for sometime, we can start a
> > separate discussion about whether we want to put some guarantees on them.
> >
> > - - -- --- -  -
> > Jacek Lewandowski
> >
> >
> > On Wed, Nov 10, 2021 at 9:01 PM David Capwell  >
> > wrote:
> >
> >> If this gets descoped to test only (can break all interfaces in a minor)
> >> then my support concerns are no longer valid; I am cool with the CEP
> scoped
> >> only to improving testing
> >>
> >>> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
> >> [email protected]> wrote:
> >>>
> >>> For the other ticket (schema update handler interface) I was also
> >> proposing
> >>> a kind of @DeveloperApi annotation as seen in other projects but
> >> similarly
> >>> to this thread there were different opinions and no conclusion. After
> >>> reading the comments I must agree that perhaps it is way too early to
> >> mark
> >>> this interface as stable. Perhaps it was too far-fetched to say it
> would
> >> be
> >>> for people who wished to replace the SSTable format. My focus is
> >>> primarily on cleaning up the code (modularization and clean contracts)
> >> and
> >>> making it possible to introduce a new format in the future while
> allowing
> >>> us to maintain the old format (no "if then else" approach)
> >>>
> >>> - - -- --- -  -
> >>> Jacek Lewandowski
> >>>
> >>>
> >>> On Wed, Nov 10, 2021 at 12:53 AM [email protected] <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>>> I may be wrong here, but the CEP directly calls out making this api
> >>>> public for people who wish to replace the SSTable format
> >>>>
> >>>> I don’t think this implies API stability. For starters, it doesn’t
> >>>> stipulate that these implementations will be supported out of tree
> (the
> >>>> only one I’m aware of, so far as I understand, is intended to be
> >> incubated
> >>>> in tree), nor does an API for external usage have to be stable. It’s
> >> fine
> >>>> to create an API and tell users it’s unstable, and that they should
> >> closely
> >>>> monitor patch version changes if they use it.
> >>>>
> >>>> That said, norms may be changing around what can go into patch
> releases
> >>>> anyhow, so this may be a lot of noise about nothing. If all new
> >> development
> >>>> goes into trunk, then it’s all moot. But I don’t think we can make
> hard
> >>>> assumptions about that today, as historically these sorts of
> intentions
> >>>> haven’t lasted.
> >>>>
> >>>> I’m fairly against the idea of introducing hard restrictions on this,
> >> and
> >>>> potentially ossifying the codebase. I’m not keen to even consider out
> of
> >>>> tree consumers of these APIs in any way, for compatibility,
> >> upgradeability
> >>>> or anything. There’s a lot that needs to be done over the coming years
> >> to
> >>>> improve the internal structure of the project, and unduly entrenching
> >> the
> >>>> current state of affairs would be a huge potential harm of these
> >> efforts to
> >>>> modularise the codebase.
> >>>>
> >>>> From: David Capwell 
> >>>> Date: Tuesday, 9 November 2021 at 23:38
> >>>> To: [email protected] 
> >>>> Subject: Re: [DISCUSS] CEP-17: SSTable format 

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-15 Thread David Capwell
Works for me

> On Nov 15, 2021, at 4:21 AM, Jacek Lewandowski  
> wrote:
> 
> I'd put it another way - the scope is to make it possible to provide a new
> implementation of sstable format without the necessity to refactor
> Cassandra code. It implies a contract about the responsibilities of sstable
> format implementation so that the rest of the code can rely on that, and
> only on that, and do not make assumptions beyond that. But it does not
> claim that the created interfaces will not change even with a minor version
> release. When those interfaces are around for sometime, we can start a
> separate discussion about whether we want to put some guarantees on them.
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> On Wed, Nov 10, 2021 at 9:01 PM David Capwell 
> wrote:
> 
>> If this gets descoped to test only (can break all interfaces in a minor)
>> then my support concerns are no longer valid; I am cool with the CEP scoped
>> only to improving testing
>> 
>>> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
>> [email protected]> wrote:
>>> 
>>> For the other ticket (schema update handler interface) I was also
>> proposing
>>> a kind of @DeveloperApi annotation as seen in other projects but
>> similarly
>>> to this thread there were different opinions and no conclusion. After
>>> reading the comments I must agree that perhaps it is way too early to
>> mark
>>> this interface as stable. Perhaps it was too far-fetched to say it would
>> be
>>> for people who wished to replace the SSTable format. My focus is
>>> primarily on cleaning up the code (modularization and clean contracts)
>> and
>>> making it possible to introduce a new format in the future while allowing
>>> us to maintain the old format (no "if then else" approach)
>>> 
>>> - - -- --- -  -
>>> Jacek Lewandowski
>>> 
>>> 
>>> On Wed, Nov 10, 2021 at 12:53 AM [email protected] <
>> [email protected]>
>>> wrote:
>>> 
>>>>> I may be wrong here, but the CEP directly calls out making this api
>>>> public for people who wish to replace the SSTable format
>>>> 
>>>> I don’t think this implies API stability. For starters, it doesn’t
>>>> stipulate that these implementations will be supported out of tree (the
>>>> only one I’m aware of, so far as I understand, is intended to be
>> incubated
>>>> in tree), nor does an API for external usage have to be stable. It’s
>> fine
>>>> to create an API and tell users it’s unstable, and that they should
>> closely
>>>> monitor patch version changes if they use it.
>>>> 
>>>> That said, norms may be changing around what can go into patch releases
>>>> anyhow, so this may be a lot of noise about nothing. If all new
>> development
>>>> goes into trunk, then it’s all moot. But I don’t think we can make hard
>>>> assumptions about that today, as historically these sorts of intentions
>>>> haven’t lasted.
>>>> 
>>>> I’m fairly against the idea of introducing hard restrictions on this,
>> and
>>>> potentially ossifying the codebase. I’m not keen to even consider out of
>>>> tree consumers of these APIs in any way, for compatibility,
>> upgradeability
>>>> or anything. There’s a lot that needs to be done over the coming years
>> to
>>>> improve the internal structure of the project, and unduly entrenching
>> the
>>>> current state of affairs would be a huge potential harm of these
>> efforts to
>>>> modularise the codebase.
>>>> 
>>>> From: David Capwell 
>>>> Date: Tuesday, 9 November 2021 at 23:38
>>>> To: [email protected] 
>>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>>>> My understanding is that the only interface that is expected to be
>>>> stable for external consumers is the secondary index API
>>>> 
>>>> I may be wrong here, but the CEP directly calls out making this api
>> public
>>>> for people who wish to replace the SSTable format ("Cassandra developers
>>>> who want to develop and publish different file format
>> implementations."),
>>>> so if we need to support 2i API, why would we not support SSTable API as
>>>> well?
>>>> 
>>>>> All of the other mentioned APIs are 

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-15 Thread Jacek Lewandowski
I'd put it another way - the scope is to make it possible to provide a new
implementation of sstable format without the necessity to refactor
Cassandra code. It implies a contract about the responsibilities of sstable
format implementation so that the rest of the code can rely on that, and
only on that, and do not make assumptions beyond that. But it does not
claim that the created interfaces will not change even with a minor version
release. When those interfaces are around for sometime, we can start a
separate discussion about whether we want to put some guarantees on them.

- - -- --- -  -
Jacek Lewandowski


On Wed, Nov 10, 2021 at 9:01 PM David Capwell 
wrote:

> If this gets descoped to test only (can break all interfaces in a minor)
> then my support concerns are no longer valid; I am cool with the CEP scoped
> only to improving testing
>
> > On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
> [email protected]> wrote:
> >
> > For the other ticket (schema update handler interface) I was also
> proposing
> > a kind of @DeveloperApi annotation as seen in other projects but
> similarly
> > to this thread there were different opinions and no conclusion. After
> > reading the comments I must agree that perhaps it is way too early to
> mark
> > this interface as stable. Perhaps it was too far-fetched to say it would
> be
> > for people who wished to replace the SSTable format. My focus is
> > primarily on cleaning up the code (modularization and clean contracts)
> and
> > making it possible to introduce a new format in the future while allowing
> > us to maintain the old format (no "if then else" approach)
> >
> > - - -- --- -  -
> > Jacek Lewandowski
> >
> >
> > On Wed, Nov 10, 2021 at 12:53 AM [email protected] <
> [email protected]>
> > wrote:
> >
> >>> I may be wrong here, but the CEP directly calls out making this api
> >> public for people who wish to replace the SSTable format
> >>
> >> I don’t think this implies API stability. For starters, it doesn’t
> >> stipulate that these implementations will be supported out of tree (the
> >> only one I’m aware of, so far as I understand, is intended to be
> incubated
> >> in tree), nor does an API for external usage have to be stable. It’s
> fine
> >> to create an API and tell users it’s unstable, and that they should
> closely
> >> monitor patch version changes if they use it.
> >>
> >> That said, norms may be changing around what can go into patch releases
> >> anyhow, so this may be a lot of noise about nothing. If all new
> development
> >> goes into trunk, then it’s all moot. But I don’t think we can make hard
> >> assumptions about that today, as historically these sorts of intentions
> >> haven’t lasted.
> >>
> >> I’m fairly against the idea of introducing hard restrictions on this,
> and
> >> potentially ossifying the codebase. I’m not keen to even consider out of
> >> tree consumers of these APIs in any way, for compatibility,
> upgradeability
> >> or anything. There’s a lot that needs to be done over the coming years
> to
> >> improve the internal structure of the project, and unduly entrenching
> the
> >> current state of affairs would be a huge potential harm of these
> efforts to
> >> modularise the codebase.
> >>
> >> From: David Capwell 
> >> Date: Tuesday, 9 November 2021 at 23:38
> >> To: [email protected] 
> >> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> >>> My understanding is that the only interface that is expected to be
> >> stable for external consumers is the secondary index API
> >>
> >> I may be wrong here, but the CEP directly calls out making this api
> public
> >> for people who wish to replace the SSTable format ("Cassandra developers
> >> who want to develop and publish different file format
> implementations."),
> >> so if we need to support 2i API, why would we not support SSTable API as
> >> well?
> >>
> >>> All of the other mentioned APIs are in my opinion for internal usage
> only
> >>
> >> This gets back to my point; it is currently tribal knowledge what needs
> to
> >> work and what doesn’t, and without the broader set of committers knowing
> >> this then the likely hood any new API will break in a minor is high.
> >>
> >>> On Nov 9, 2021, at 12:13 PM, [email protected] wrote:
> >>>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-10 Thread David Capwell
If this gets descoped to test only (can break all interfaces in a minor) then 
my support concerns are no longer valid; I am cool with the CEP scoped only to 
improving testing

> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski  
> wrote:
> 
> For the other ticket (schema update handler interface) I was also proposing
> a kind of @DeveloperApi annotation as seen in other projects but similarly
> to this thread there were different opinions and no conclusion. After
> reading the comments I must agree that perhaps it is way too early to mark
> this interface as stable. Perhaps it was too far-fetched to say it would be
> for people who wished to replace the SSTable format. My focus is
> primarily on cleaning up the code (modularization and clean contracts) and
> making it possible to introduce a new format in the future while allowing
> us to maintain the old format (no "if then else" approach)
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> On Wed, Nov 10, 2021 at 12:53 AM [email protected] 
> wrote:
> 
>>> I may be wrong here, but the CEP directly calls out making this api
>> public for people who wish to replace the SSTable format
>> 
>> I don’t think this implies API stability. For starters, it doesn’t
>> stipulate that these implementations will be supported out of tree (the
>> only one I’m aware of, so far as I understand, is intended to be incubated
>> in tree), nor does an API for external usage have to be stable. It’s fine
>> to create an API and tell users it’s unstable, and that they should closely
>> monitor patch version changes if they use it.
>> 
>> That said, norms may be changing around what can go into patch releases
>> anyhow, so this may be a lot of noise about nothing. If all new development
>> goes into trunk, then it’s all moot. But I don’t think we can make hard
>> assumptions about that today, as historically these sorts of intentions
>> haven’t lasted.
>> 
>> I’m fairly against the idea of introducing hard restrictions on this, and
>> potentially ossifying the codebase. I’m not keen to even consider out of
>> tree consumers of these APIs in any way, for compatibility, upgradeability
>> or anything. There’s a lot that needs to be done over the coming years to
>> improve the internal structure of the project, and unduly entrenching the
>> current state of affairs would be a huge potential harm of these efforts to
>> modularise the codebase.
>> 
>> From: David Capwell 
>> Date: Tuesday, 9 November 2021 at 23:38
>> To: [email protected] 
>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> My understanding is that the only interface that is expected to be
>> stable for external consumers is the secondary index API
>> 
>> I may be wrong here, but the CEP directly calls out making this api public
>> for people who wish to replace the SSTable format ("Cassandra developers
>> who want to develop and publish different file format implementations."),
>> so if we need to support 2i API, why would we not support SSTable API as
>> well?
>> 
>>> All of the other mentioned APIs are in my opinion for internal usage only
>> 
>> This gets back to my point; it is currently tribal knowledge what needs to
>> work and what doesn’t, and without the broader set of committers knowing
>> this then the likely hood any new API will break in a minor is high.
>> 
>>> On Nov 9, 2021, at 12:13 PM, [email protected] wrote:
>>> 
>>> I agree that we don’t need to block the CEP on this, and that we should
>> have that discussion. But it’s worth noting that the CEP should not
>> anticipate or depend on any specific outcome of that discussion.
>>> 
>>> Since it is somewhat relevant for this discussion, my view is that no
>> interface should be assumed to be stable without the prior explicit
>> agreement of the community.
>>> 
>>> My understanding is that the only interface that is expected to be
>> stable for external consumers is the secondary index API. Perhaps also
>> snitches? But also perhaps not, as the difficulty of upgrading these at the
>> same time is pretty low for custom snitches. All of the other mentioned
>> APIs are in my opinion for internal usage only, so users should not assume
>> compile time compatibility across any release, and I am certain we have
>> never tried to maintained this. This still facilitates forks of course, by
>> localising the compatibility work.
>>> 
>>> 
>>> From: Jeremiah D Jordan 
>>> Date: Tuesday, 9 November 2021 at 

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-10 Thread Jacek Lewandowski
For the other ticket (schema update handler interface) I was also proposing
a kind of @DeveloperApi annotation as seen in other projects but similarly
to this thread there were different opinions and no conclusion. After
reading the comments I must agree that perhaps it is way too early to mark
this interface as stable. Perhaps it was too far-fetched to say it would be
for people who wished to replace the SSTable format. My focus is
primarily on cleaning up the code (modularization and clean contracts) and
making it possible to introduce a new format in the future while allowing
us to maintain the old format (no "if then else" approach)

- - -- --- -  -
Jacek Lewandowski


On Wed, Nov 10, 2021 at 12:53 AM [email protected] 
wrote:

> > I may be wrong here, but the CEP directly calls out making this api
> public for people who wish to replace the SSTable format
>
> I don’t think this implies API stability. For starters, it doesn’t
> stipulate that these implementations will be supported out of tree (the
> only one I’m aware of, so far as I understand, is intended to be incubated
> in tree), nor does an API for external usage have to be stable. It’s fine
> to create an API and tell users it’s unstable, and that they should closely
> monitor patch version changes if they use it.
>
> That said, norms may be changing around what can go into patch releases
> anyhow, so this may be a lot of noise about nothing. If all new development
> goes into trunk, then it’s all moot. But I don’t think we can make hard
> assumptions about that today, as historically these sorts of intentions
> haven’t lasted.
>
> I’m fairly against the idea of introducing hard restrictions on this, and
> potentially ossifying the codebase. I’m not keen to even consider out of
> tree consumers of these APIs in any way, for compatibility, upgradeability
> or anything. There’s a lot that needs to be done over the coming years to
> improve the internal structure of the project, and unduly entrenching the
> current state of affairs would be a huge potential harm of these efforts to
> modularise the codebase.
>
> From: David Capwell 
> Date: Tuesday, 9 November 2021 at 23:38
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > My understanding is that the only interface that is expected to be
> stable for external consumers is the secondary index API
>
> I may be wrong here, but the CEP directly calls out making this api public
> for people who wish to replace the SSTable format ("Cassandra developers
> who want to develop and publish different file format implementations."),
> so if we need to support 2i API, why would we not support SSTable API as
> well?
>
> > All of the other mentioned APIs are in my opinion for internal usage only
>
> This gets back to my point; it is currently tribal knowledge what needs to
> work and what doesn’t, and without the broader set of committers knowing
> this then the likely hood any new API will break in a minor is high.
>
> > On Nov 9, 2021, at 12:13 PM, [email protected] wrote:
> >
> > I agree that we don’t need to block the CEP on this, and that we should
> have that discussion. But it’s worth noting that the CEP should not
> anticipate or depend on any specific outcome of that discussion.
> >
> > Since it is somewhat relevant for this discussion, my view is that no
> interface should be assumed to be stable without the prior explicit
> agreement of the community.
> >
> > My understanding is that the only interface that is expected to be
> stable for external consumers is the secondary index API. Perhaps also
> snitches? But also perhaps not, as the difficulty of upgrading these at the
> same time is pretty low for custom snitches. All of the other mentioned
> APIs are in my opinion for internal usage only, so users should not assume
> compile time compatibility across any release, and I am certain we have
> never tried to maintained this. This still facilitates forks of course, by
> localising the compatibility work.
> >
> >
> > From: Jeremiah D Jordan 
> > Date: Tuesday, 9 November 2021 at 19:43
> > To: Cassandra DEV 
> > Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > I would love to have this discussion and setup annotations or similar to
> formalize things.  I just do not think we need to hold any up CEPs to do
> so.  That discussion should possibly be a CEP of its own proposing how we
> want to formalize interfaces?  I would be happy to go through and try to
> put together something for that or since you feel so strongly about it
> maybe you want to David?  At the very least it should get its own DISCUSS
> t

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread [email protected]
> I may be wrong here, but the CEP directly calls out making this api public 
> for people who wish to replace the SSTable format

I don’t think this implies API stability. For starters, it doesn’t stipulate 
that these implementations will be supported out of tree (the only one I’m 
aware of, so far as I understand, is intended to be incubated in tree), nor 
does an API for external usage have to be stable. It’s fine to create an API 
and tell users it’s unstable, and that they should closely monitor patch 
version changes if they use it.

That said, norms may be changing around what can go into patch releases anyhow, 
so this may be a lot of noise about nothing. If all new development goes into 
trunk, then it’s all moot. But I don’t think we can make hard assumptions about 
that today, as historically these sorts of intentions haven’t lasted.

I’m fairly against the idea of introducing hard restrictions on this, and 
potentially ossifying the codebase. I’m not keen to even consider out of tree 
consumers of these APIs in any way, for compatibility, upgradeability or 
anything. There’s a lot that needs to be done over the coming years to improve 
the internal structure of the project, and unduly entrenching the current state 
of affairs would be a huge potential harm of these efforts to modularise the 
codebase.

From: David Capwell 
Date: Tuesday, 9 November 2021 at 23:38
To: [email protected] 
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> My understanding is that the only interface that is expected to be stable for 
> external consumers is the secondary index API

I may be wrong here, but the CEP directly calls out making this api public for 
people who wish to replace the SSTable format ("Cassandra developers who want 
to develop and publish different file format implementations."), so if we need 
to support 2i API, why would we not support SSTable API as well?

> All of the other mentioned APIs are in my opinion for internal usage only

This gets back to my point; it is currently tribal knowledge what needs to work 
and what doesn’t, and without the broader set of committers knowing this then 
the likely hood any new API will break in a minor is high.

> On Nov 9, 2021, at 12:13 PM, [email protected] wrote:
>
> I agree that we don’t need to block the CEP on this, and that we should have 
> that discussion. But it’s worth noting that the CEP should not anticipate or 
> depend on any specific outcome of that discussion.
>
> Since it is somewhat relevant for this discussion, my view is that no 
> interface should be assumed to be stable without the prior explicit agreement 
> of the community.
>
> My understanding is that the only interface that is expected to be stable for 
> external consumers is the secondary index API. Perhaps also snitches? But 
> also perhaps not, as the difficulty of upgrading these at the same time is 
> pretty low for custom snitches. All of the other mentioned APIs are in my 
> opinion for internal usage only, so users should not assume compile time 
> compatibility across any release, and I am certain we have never tried to 
> maintained this. This still facilitates forks of course, by localising the 
> compatibility work.
>
>
> From: Jeremiah D Jordan 
> Date: Tuesday, 9 November 2021 at 19:43
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> I would love to have this discussion and setup annotations or similar to 
> formalize things.  I just do not think we need to hold any up CEPs to do so.  
> That discussion should possibly be a CEP of its own proposing how we want to 
> formalize interfaces?  I would be happy to go through and try to put together 
> something for that or since you feel so strongly about it maybe you want to 
> David?  At the very least it should get its own DISCUSS thread and then be 
> written up in the wiki.
>
> -Jeremiah
>
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie  wrote:
>>
>>>
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>>
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>>
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>>
>>
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
>> wrote:
>>
>>>> We already have many interfaces similar to the

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread David Capwell
> My understanding is that the only interface that is expected to be stable for 
> external consumers is the secondary index API

I may be wrong here, but the CEP directly calls out making this api public for 
people who wish to replace the SSTable format ("Cassandra developers who want 
to develop and publish different file format implementations."), so if we need 
to support 2i API, why would we not support SSTable API as well?

> All of the other mentioned APIs are in my opinion for internal usage only

This gets back to my point; it is currently tribal knowledge what needs to work 
and what doesn’t, and without the broader set of committers knowing this then 
the likely hood any new API will break in a minor is high.

> On Nov 9, 2021, at 12:13 PM, [email protected] wrote:
> 
> I agree that we don’t need to block the CEP on this, and that we should have 
> that discussion. But it’s worth noting that the CEP should not anticipate or 
> depend on any specific outcome of that discussion.
> 
> Since it is somewhat relevant for this discussion, my view is that no 
> interface should be assumed to be stable without the prior explicit agreement 
> of the community.
> 
> My understanding is that the only interface that is expected to be stable for 
> external consumers is the secondary index API. Perhaps also snitches? But 
> also perhaps not, as the difficulty of upgrading these at the same time is 
> pretty low for custom snitches. All of the other mentioned APIs are in my 
> opinion for internal usage only, so users should not assume compile time 
> compatibility across any release, and I am certain we have never tried to 
> maintained this. This still facilitates forks of course, by localising the 
> compatibility work.
> 
> 
> From: Jeremiah D Jordan 
> Date: Tuesday, 9 November 2021 at 19:43
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> I would love to have this discussion and setup annotations or similar to 
> formalize things.  I just do not think we need to hold any up CEPs to do so.  
> That discussion should possibly be a CEP of its own proposing how we want to 
> formalize interfaces?  I would be happy to go through and try to put together 
> something for that or since you feel so strongly about it maybe you want to 
> David?  At the very least it should get its own DISCUSS thread and then be 
> written up in the wiki.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie  wrote:
>> 
>>> 
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>> 
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>> 
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>> 
>> 
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
>> wrote:
>> 
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.
>>> 
>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>>> to know…
>>> 
>>>> not trunk -> try not to change these interfaces
>>> 
>>> Outside of MBeans, I honestly do not know what interfaces fall into this
>>> group; and for MBeans we have tests which block breaking changes.  The
>>> point I am making is that not everyone is aware of the rules, so having
>>> something in place to help enforce such rules should be thought about; if
>>> we want to add pluggable hooks with the intent that external parties can
>>> leverage such hooks, we should also add to the scope the maintenance of
>>> these interfaces (we should not assume “tribal knowledge” will work).
>>> 
>>> I am not trying to ask for something large or something requiring a ton of
>>> work, I am just asking that this gets thought about during the project so
>>> it doesn’t get neglected.  This could be as simple as an annotation like
>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>>> must be maintained), or it could be something like split directories
>>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>>> an implementation, only trying to make sure we are setup to su

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread [email protected]
I agree that we don’t need to block the CEP on this, and that we should have 
that discussion. But it’s worth noting that the CEP should not anticipate or 
depend on any specific outcome of that discussion.

Since it is somewhat relevant for this discussion, my view is that no interface 
should be assumed to be stable without the prior explicit agreement of the 
community.

My understanding is that the only interface that is expected to be stable for 
external consumers is the secondary index API. Perhaps also snitches? But also 
perhaps not, as the difficulty of upgrading these at the same time is pretty 
low for custom snitches. All of the other mentioned APIs are in my opinion for 
internal usage only, so users should not assume compile time compatibility 
across any release, and I am certain we have never tried to maintained this. 
This still facilitates forks of course, by localising the compatibility work.


From: Jeremiah D Jordan 
Date: Tuesday, 9 November 2021 at 19:43
To: Cassandra DEV 
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
I would love to have this discussion and setup annotations or similar to 
formalize things.  I just do not think we need to hold any up CEPs to do so.  
That discussion should possibly be a CEP of its own proposing how we want to 
formalize interfaces?  I would be happy to go through and try to put together 
something for that or since you feel so strongly about it maybe you want to 
David?  At the very least it should get its own DISCUSS thread and then be 
written up in the wiki.

-Jeremiah

> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie  wrote:
>
>>
>> trunk -> anything goes, not trunk -> try not to change these interfaces
>
> Have we ever clarified what "these interfaces" are? Was just talking to
> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
> designed_ for external usage. /sigh
>
> I think it'd be valuable for us to go through the codebase and annotate
> interfaces as intended to be exposed to 3rd parties; this has bothered me
> for years. Especially as we come up on a large number of new cleanups,
> refactorings, and potentially genericizing some subsystems into API's
> (CEP-18 descendents).
>
>
> On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
> wrote:
>
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.
>>
>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>> to know…
>>
>>> not trunk -> try not to change these interfaces
>>
>> Outside of MBeans, I honestly do not know what interfaces fall into this
>> group; and for MBeans we have tests which block breaking changes.  The
>> point I am making is that not everyone is aware of the rules, so having
>> something in place to help enforce such rules should be thought about; if
>> we want to add pluggable hooks with the intent that external parties can
>> leverage such hooks, we should also add to the scope the maintenance of
>> these interfaces (we should not assume “tribal knowledge” will work).
>>
>> I am not trying to ask for something large or something requiring a ton of
>> work, I am just asking that this gets thought about during the project so
>> it doesn’t get neglected.  This could be as simple as an annotation like
>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>> must be maintained), or it could be something like split directories
>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>> an implementation, only trying to make sure we are setup to support the CEP
>> after the work is done.
>>
>>
>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan 
>> wrote:
>>>
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>> following a policy along the lines of trunk -> anything goes, not trunk ->
>> try not to change these interfaces.  I would expect that to be the same
>> policy for any new internal interfaces that are added.  But given we
>> already have many such interfaces, I see no reason to block adding more of
>> them while change policies are discussed.
>>>
>>> -Jeremiah
>>>
>>>> On Nov 9, 2021, at 10:44 AM, David Capwell 
>> wrote:
>>>>
>>>> I still have one outstanding comment, but this is a comment for several
>> of the CEPs being worked on
>>>>
>>>>> And last comment, which I have also done in the other modularity
>> thread… backwards compatibility and maintenance. It is not clear right now
&

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread David Capwell
> I would be happy to go through and try to put together something for that ... 
>  At the very least it should get its own DISCUSS thread and then be written 
> up in the wiki.

+1. Thanks.

> On Nov 9, 2021, at 11:43 AM, Jeremiah D Jordan  
> wrote:
> 
> I would love to have this discussion and setup annotations or similar to 
> formalize things.  I just do not think we need to hold any up CEPs to do so.  
> That discussion should possibly be a CEP of its own proposing how we want to 
> formalize interfaces?  I would be happy to go through and try to put together 
> something for that or since you feel so strongly about it maybe you want to 
> David?  At the very least it should get its own DISCUSS thread and then be 
> written up in the wiki.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie  wrote:
>> 
>>> 
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>> 
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>> 
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>> 
>> 
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
>> wrote:
>> 
 We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.
>>> 
>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>>> to know…
>>> 
 not trunk -> try not to change these interfaces
>>> 
>>> Outside of MBeans, I honestly do not know what interfaces fall into this
>>> group; and for MBeans we have tests which block breaking changes.  The
>>> point I am making is that not everyone is aware of the rules, so having
>>> something in place to help enforce such rules should be thought about; if
>>> we want to add pluggable hooks with the intent that external parties can
>>> leverage such hooks, we should also add to the scope the maintenance of
>>> these interfaces (we should not assume “tribal knowledge” will work).
>>> 
>>> I am not trying to ask for something large or something requiring a ton of
>>> work, I am just asking that this gets thought about during the project so
>>> it doesn’t get neglected.  This could be as simple as an annotation like
>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>>> must be maintained), or it could be something like split directories
>>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>>> an implementation, only trying to make sure we are setup to support the CEP
>>> after the work is done.
>>> 
>>> 
 On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan 
>>> wrote:
 
 We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>>> following a policy along the lines of trunk -> anything goes, not trunk ->
>>> try not to change these interfaces.  I would expect that to be the same
>>> policy for any new internal interfaces that are added.  But given we
>>> already have many such interfaces, I see no reason to block adding more of
>>> them while change policies are discussed.
 
 -Jeremiah
 
> On Nov 9, 2021, at 10:44 AM, David Capwell 
>>> wrote:
> 
> I still have one outstanding comment, but this is a comment for several
>>> of the CEPs being worked on
> 
>> And last comment, which I have also done in the other modularity
>>> thread… backwards compatibility and maintenance. It is not clear right now
>>> what java interfaces may not break and how we can maintain and extend such
>>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>>> and offer new SSTable formats, are we as a project ok with having a minor
>>> release do a binary or source non-compatible change?  If not how do we
>>> detect this?  Until this problem is solved, I do not think we should add
>>> any such interfaces.
> 
> I would love some clarity on this.  Specifically, if we assume a patch
>>> author/reviewers are not familiar with the impact of changes these
>>> interfaces, what happens?  Do we have tools to block this? Do we require
>>> 3rd party authors to create massive shims to deal with every patch level
>>> version out there?  I would love more clarity on how we maintain these new
>>> pluggable interfaces.
> 
>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov 
>>> wrote:
>> 
>> Does anyone have any further comments or questions on the proposal, or
>>> are
>> we ready to  move forward to a vote?
>> 
>> Regards,
>> Branimir
>> 
>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>> 
>> wrote:
>> 
>>>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread Jeremiah D Jordan
I would love to have this discussion and setup annotations or similar to 
formalize things.  I just do not think we need to hold any up CEPs to do so.  
That discussion should possibly be a CEP of its own proposing how we want to 
formalize interfaces?  I would be happy to go through and try to put together 
something for that or since you feel so strongly about it maybe you want to 
David?  At the very least it should get its own DISCUSS thread and then be 
written up in the wiki.

-Jeremiah

> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie  wrote:
> 
>> 
>> trunk -> anything goes, not trunk -> try not to change these interfaces
> 
> Have we ever clarified what "these interfaces" are? Was just talking to
> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
> designed_ for external usage. /sigh
> 
> I think it'd be valuable for us to go through the codebase and annotate
> interfaces as intended to be exposed to 3rd parties; this has bothered me
> for years. Especially as we come up on a large number of new cleanups,
> refactorings, and potentially genericizing some subsystems into API's
> (CEP-18 descendents).
> 
> 
> On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
> wrote:
> 
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.
>> 
>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>> to know…
>> 
>>> not trunk -> try not to change these interfaces
>> 
>> Outside of MBeans, I honestly do not know what interfaces fall into this
>> group; and for MBeans we have tests which block breaking changes.  The
>> point I am making is that not everyone is aware of the rules, so having
>> something in place to help enforce such rules should be thought about; if
>> we want to add pluggable hooks with the intent that external parties can
>> leverage such hooks, we should also add to the scope the maintenance of
>> these interfaces (we should not assume “tribal knowledge” will work).
>> 
>> I am not trying to ask for something large or something requiring a ton of
>> work, I am just asking that this gets thought about during the project so
>> it doesn’t get neglected.  This could be as simple as an annotation like
>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>> must be maintained), or it could be something like split directories
>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>> an implementation, only trying to make sure we are setup to support the CEP
>> after the work is done.
>> 
>> 
>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan 
>> wrote:
>>> 
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>> following a policy along the lines of trunk -> anything goes, not trunk ->
>> try not to change these interfaces.  I would expect that to be the same
>> policy for any new internal interfaces that are added.  But given we
>> already have many such interfaces, I see no reason to block adding more of
>> them while change policies are discussed.
>>> 
>>> -Jeremiah
>>> 
 On Nov 9, 2021, at 10:44 AM, David Capwell 
>> wrote:
 
 I still have one outstanding comment, but this is a comment for several
>> of the CEPs being worked on
 
> And last comment, which I have also done in the other modularity
>> thread… backwards compatibility and maintenance. It is not clear right now
>> what java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
 
 I would love some clarity on this.  Specifically, if we assume a patch
>> author/reviewers are not familiar with the impact of changes these
>> interfaces, what happens?  Do we have tools to block this? Do we require
>> 3rd party authors to create massive shims to deal with every patch level
>> version out there?  I would love more clarity on how we maintain these new
>> pluggable interfaces.
 
> On Nov 9, 2021, at 4:45 AM, Branimir Lambov 
>> wrote:
> 
> Does anyone have any further comments or questions on the proposal, or
>> are
> we ready to  move forward to a vote?
> 
> Regards,
> Branimir
> 
> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>> 
> wrote:
> 
>>> I apologize I did not mention those things explicitly. All the places
>> where
>>> sstable files are accessed directly would have to be refactored.
>> 
>> Works for me
>> 
>>> Speaking about the implementation, one idea I was thinking about was
>> that
>>> the factories for formats are registered using Java's native service
>>> loader

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread Joshua McKenzie
>
> trunk -> anything goes, not trunk -> try not to change these interfaces

Have we ever clarified what "these interfaces" are? Was just talking to
David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
designed_ for external usage. /sigh

I think it'd be valuable for us to go through the codebase and annotate
interfaces as intended to be exposed to 3rd parties; this has bothered me
for years. Especially as we come up on a large number of new cleanups,
refactorings, and potentially genericizing some subsystems into API's
(CEP-18 descendents).


On Tue, Nov 9, 2021 at 2:01 PM David Capwell 
wrote:

> > We already have many interfaces similar to these for Compaction
> Strategy, Indexing, Query Handler.
>
> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
> to know…
>
> > not trunk -> try not to change these interfaces
>
> Outside of MBeans, I honestly do not know what interfaces fall into this
> group; and for MBeans we have tests which block breaking changes.  The
> point I am making is that not everyone is aware of the rules, so having
> something in place to help enforce such rules should be thought about; if
> we want to add pluggable hooks with the intent that external parties can
> leverage such hooks, we should also add to the scope the maintenance of
> these interfaces (we should not assume “tribal knowledge” will work).
>
> I am not trying to ask for something large or something requiring a ton of
> work, I am just asking that this gets thought about during the project so
> it doesn’t get neglected.  This could be as simple as an annotation like
> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
> must be maintained), or it could be something like split directories
> (src/java = private, src/java-exposed = public); I am trying not to dictate
> an implementation, only trying to make sure we are setup to support the CEP
> after the work is done.
>
>
> > On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan 
> wrote:
> >
> > We already have many interfaces similar to these for Compaction
> Strategy, Indexing, Query Handler.  I would hope that commiters are already
> following a policy along the lines of trunk -> anything goes, not trunk ->
> try not to change these interfaces.  I would expect that to be the same
> policy for any new internal interfaces that are added.  But given we
> already have many such interfaces, I see no reason to block adding more of
> them while change policies are discussed.
> >
> > -Jeremiah
> >
> >> On Nov 9, 2021, at 10:44 AM, David Capwell 
> wrote:
> >>
> >> I still have one outstanding comment, but this is a comment for several
> of the CEPs being worked on
> >>
> >>> And last comment, which I have also done in the other modularity
> thread… backwards compatibility and maintenance. It is not clear right now
> what java interfaces may not break and how we can maintain and extend such
> interfaces in the future.  If the goal is to allow 3rd parties to plugin
> and offer new SSTable formats, are we as a project ok with having a minor
> release do a binary or source non-compatible change?  If not how do we
> detect this?  Until this problem is solved, I do not think we should add
> any such interfaces.
> >>
> >> I would love some clarity on this.  Specifically, if we assume a patch
> author/reviewers are not familiar with the impact of changes these
> interfaces, what happens?  Do we have tools to block this? Do we require
> 3rd party authors to create massive shims to deal with every patch level
> version out there?  I would love more clarity on how we maintain these new
> pluggable interfaces.
> >>
> >>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov 
> wrote:
> >>>
> >>> Does anyone have any further comments or questions on the proposal, or
> are
> >>> we ready to  move forward to a vote?
> >>>
> >>> Regards,
> >>> Branimir
> >>>
> >>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
> 
> >>> wrote:
> >>>
> > I apologize I did not mention those things explicitly. All the places
>  where
> > sstable files are accessed directly would have to be refactored.
> 
>  Works for me
> 
> > Speaking about the implementation, one idea I was thinking about was
> that
> > the factories for formats are registered using Java's native service
> > loader.
> 
>  I am a fan of ServiceLoader as a means of plugging in.
> 
> > I hope this explains a bit
> 
>  Yep; thanks!
> 
> > On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>  [email protected]> wrote:
> >
> > David,
> >
> > I apologize I did not mention those things explicitly. All the places
>  where
> > sstable files are accessed directly would have to be refactored.
> >
> > Regarding TableMetrics - currently it includes many metrics, some of
> them
> > are unrelated to sstables at all, but there are metrics which are
>  specific
> > to the current sstable for

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread David Capwell
> We already have many interfaces similar to these for Compaction Strategy, 
> Indexing, Query Handler.

Today-I-Learned QueryHandler is not allowed to be touched in a minor… good to 
know…

> not trunk -> try not to change these interfaces

Outside of MBeans, I honestly do not know what interfaces fall into this group; 
and for MBeans we have tests which block breaking changes.  The point I am 
making is that not everyone is aware of the rules, so having something in place 
to help enforce such rules should be thought about; if we want to add pluggable 
hooks with the intent that external parties can leverage such hooks, we should 
also add to the scope the maintenance of these interfaces (we should not assume 
“tribal knowledge” will work).

I am not trying to ask for something large or something requiring a ton of 
work, I am just asking that this gets thought about during the project so it 
doesn’t get neglected.  This could be as simple as an annotation like 
@ExposedTo3rdParties (Hadoop does this to show an interface is exposed and must 
be maintained), or it could be something like split directories (src/java = 
private, src/java-exposed = public); I am trying not to dictate an 
implementation, only trying to make sure we are setup to support the CEP after 
the work is done.


> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan  
> wrote:
> 
> We already have many interfaces similar to these for Compaction Strategy, 
> Indexing, Query Handler.  I would hope that commiters are already following a 
> policy along the lines of trunk -> anything goes, not trunk -> try not to 
> change these interfaces.  I would expect that to be the same policy for any 
> new internal interfaces that are added.  But given we already have many such 
> interfaces, I see no reason to block adding more of them while change 
> policies are discussed.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 10:44 AM, David Capwell  
>> wrote:
>> 
>> I still have one outstanding comment, but this is a comment for several of 
>> the CEPs being worked on
>> 
>>> And last comment, which I have also done in the other modularity thread… 
>>> backwards compatibility and maintenance. It is not clear right now what 
>>> java interfaces may not break and how we can maintain and extend such 
>>> interfaces in the future.  If the goal is to allow 3rd parties to plugin 
>>> and offer new SSTable formats, are we as a project ok with having a minor 
>>> release do a binary or source non-compatible change?  If not how do we 
>>> detect this?  Until this problem is solved, I do not think we should add 
>>> any such interfaces.
>> 
>> I would love some clarity on this.  Specifically, if we assume a patch 
>> author/reviewers are not familiar with the impact of changes these 
>> interfaces, what happens?  Do we have tools to block this? Do we require 3rd 
>> party authors to create massive shims to deal with every patch level version 
>> out there?  I would love more clarity on how we maintain these new pluggable 
>> interfaces.
>> 
>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov  wrote:
>>> 
>>> Does anyone have any further comments or questions on the proposal, or are
>>> we ready to  move forward to a vote?
>>> 
>>> Regards,
>>> Branimir
>>> 
>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell 
>>> wrote:
>>> 
> I apologize I did not mention those things explicitly. All the places
 where
> sstable files are accessed directly would have to be refactored.
 
 Works for me
 
> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader.
 
 I am a fan of ServiceLoader as a means of plugging in.
 
> I hope this explains a bit
 
 Yep; thanks!
 
> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
 [email protected]> wrote:
> 
> David,
> 
> I apologize I did not mention those things explicitly. All the places
 where
> sstable files are accessed directly would have to be refactored.
> 
> Regarding TableMetrics - currently it includes many metrics, some of them
> are unrelated to sstables at all, but there are metrics which are
 specific
> to the current sstable format, like metrics related to index summaries or
> bloom filters. The created gauges query certain methods on sstable
 reader -
> I think the only common metrics for sstables we can leave in TableMetrics
> are those for which there are query methods in generic sstable interface.
> Other metrics, specific to the certain sstable format should be
 registered
> by the implementation itself.
> 
> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader. This way we could get the list of all the factories on the
> classpath and call some method, like `registerMetric

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread Jeremiah D Jordan
We already have many interfaces similar to these for Compaction Strategy, 
Indexing, Query Handler.  I would hope that commiters are already following a 
policy along the lines of trunk -> anything goes, not trunk -> try not to 
change these interfaces.  I would expect that to be the same policy for any new 
internal interfaces that are added.  But given we already have many such 
interfaces, I see no reason to block adding more of them while change policies 
are discussed.

-Jeremiah

> On Nov 9, 2021, at 10:44 AM, David Capwell  wrote:
> 
> I still have one outstanding comment, but this is a comment for several of 
> the CEPs being worked on
> 
>> And last comment, which I have also done in the other modularity thread… 
>> backwards compatibility and maintenance. It is not clear right now what java 
>> interfaces may not break and how we can maintain and extend such interfaces 
>> in the future.  If the goal is to allow 3rd parties to plugin and offer new 
>> SSTable formats, are we as a project ok with having a minor release do a 
>> binary or source non-compatible change?  If not how do we detect this?  
>> Until this problem is solved, I do not think we should add any such 
>> interfaces.
> 
> I would love some clarity on this.  Specifically, if we assume a patch 
> author/reviewers are not familiar with the impact of changes these 
> interfaces, what happens?  Do we have tools to block this? Do we require 3rd 
> party authors to create massive shims to deal with every patch level version 
> out there?  I would love more clarity on how we maintain these new pluggable 
> interfaces.
> 
>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov  wrote:
>> 
>> Does anyone have any further comments or questions on the proposal, or are
>> we ready to  move forward to a vote?
>> 
>> Regards,
>> Branimir
>> 
>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell 
>> wrote:
>> 
 I apologize I did not mention those things explicitly. All the places
>>> where
 sstable files are accessed directly would have to be refactored.
>>> 
>>> Works for me
>>> 
 Speaking about the implementation, one idea I was thinking about was that
 the factories for formats are registered using Java's native service
 loader.
>>> 
>>> I am a fan of ServiceLoader as a means of plugging in.
>>> 
 I hope this explains a bit
>>> 
>>> Yep; thanks!
>>> 
 On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>> [email protected]> wrote:
 
 David,
 
 I apologize I did not mention those things explicitly. All the places
>>> where
 sstable files are accessed directly would have to be refactored.
 
 Regarding TableMetrics - currently it includes many metrics, some of them
 are unrelated to sstables at all, but there are metrics which are
>>> specific
 to the current sstable format, like metrics related to index summaries or
 bloom filters. The created gauges query certain methods on sstable
>>> reader -
 I think the only common metrics for sstables we can leave in TableMetrics
 are those for which there are query methods in generic sstable interface.
 Other metrics, specific to the certain sstable format should be
>>> registered
 by the implementation itself.
 
 Speaking about the implementation, one idea I was thinking about was that
 the factories for formats are registered using Java's native service
 loader. This way we could get the list of all the factories on the
 classpath and call some method, like `registerMetrics` during system
 initialization. That could be also implemented in static initializer in
>>> the
 factory but it would make it less obvious for the implementors where such
 initialization should be done.
 
 I hope this explains a bit
 
 Thanks,
 Jacek
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>>> 
> 
> 
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread David Capwell
I still have one outstanding comment, but this is a comment for several of the 
CEPs being worked on

> And last comment, which I have also done in the other modularity thread… 
> backwards compatibility and maintenance. It is not clear right now what java 
> interfaces may not break and how we can maintain and extend such interfaces 
> in the future.  If the goal is to allow 3rd parties to plugin and offer new 
> SSTable formats, are we as a project ok with having a minor release do a 
> binary or source non-compatible change?  If not how do we detect this?  Until 
> this problem is solved, I do not think we should add any such interfaces.

I would love some clarity on this.  Specifically, if we assume a patch 
author/reviewers are not familiar with the impact of changes these interfaces, 
what happens?  Do we have tools to block this? Do we require 3rd party authors 
to create massive shims to deal with every patch level version out there?  I 
would love more clarity on how we maintain these new pluggable interfaces.

> On Nov 9, 2021, at 4:45 AM, Branimir Lambov  wrote:
> 
> Does anyone have any further comments or questions on the proposal, or are
> we ready to  move forward to a vote?
> 
> Regards,
> Branimir
> 
> On Tue, Nov 2, 2021 at 7:15 PM David Capwell 
> wrote:
> 
>>> I apologize I did not mention those things explicitly. All the places
>> where
>>> sstable files are accessed directly would have to be refactored.
>> 
>> Works for me
>> 
>>> Speaking about the implementation, one idea I was thinking about was that
>>> the factories for formats are registered using Java's native service
>>> loader.
>> 
>> I am a fan of ServiceLoader as a means of plugging in.
>> 
>>> I hope this explains a bit
>> 
>> Yep; thanks!
>> 
>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>> [email protected]> wrote:
>>> 
>>> David,
>>> 
>>> I apologize I did not mention those things explicitly. All the places
>> where
>>> sstable files are accessed directly would have to be refactored.
>>> 
>>> Regarding TableMetrics - currently it includes many metrics, some of them
>>> are unrelated to sstables at all, but there are metrics which are
>> specific
>>> to the current sstable format, like metrics related to index summaries or
>>> bloom filters. The created gauges query certain methods on sstable
>> reader -
>>> I think the only common metrics for sstables we can leave in TableMetrics
>>> are those for which there are query methods in generic sstable interface.
>>> Other metrics, specific to the certain sstable format should be
>> registered
>>> by the implementation itself.
>>> 
>>> Speaking about the implementation, one idea I was thinking about was that
>>> the factories for formats are registered using Java's native service
>>> loader. This way we could get the list of all the factories on the
>>> classpath and call some method, like `registerMetrics` during system
>>> initialization. That could be also implemented in static initializer in
>> the
>>> factory but it would make it less obvious for the implementors where such
>>> initialization should be done.
>>> 
>>> I hope this explains a bit
>>> 
>>> Thanks,
>>> Jacek
>> 
>> 
>> -
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-09 Thread Branimir Lambov
Does anyone have any further comments or questions on the proposal, or are
we ready to  move forward to a vote?

Regards,
Branimir

On Tue, Nov 2, 2021 at 7:15 PM David Capwell 
wrote:

> > I apologize I did not mention those things explicitly. All the places
> where
> > sstable files are accessed directly would have to be refactored.
>
> Works for me
>
> > Speaking about the implementation, one idea I was thinking about was that
> > the factories for formats are registered using Java's native service
> > loader.
>
> I am a fan of ServiceLoader as a means of plugging in.
>
> > I hope this explains a bit
>
> Yep; thanks!
>
> > On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> [email protected]> wrote:
> >
> > David,
> >
> > I apologize I did not mention those things explicitly. All the places
> where
> > sstable files are accessed directly would have to be refactored.
> >
> > Regarding TableMetrics - currently it includes many metrics, some of them
> > are unrelated to sstables at all, but there are metrics which are
> specific
> > to the current sstable format, like metrics related to index summaries or
> > bloom filters. The created gauges query certain methods on sstable
> reader -
> > I think the only common metrics for sstables we can leave in TableMetrics
> > are those for which there are query methods in generic sstable interface.
> > Other metrics, specific to the certain sstable format should be
> registered
> > by the implementation itself.
> >
> > Speaking about the implementation, one idea I was thinking about was that
> > the factories for formats are registered using Java's native service
> > loader. This way we could get the list of all the factories on the
> > classpath and call some method, like `registerMetrics` during system
> > initialization. That could be also implemented in static initializer in
> the
> > factory but it would make it less obvious for the implementors where such
> > initialization should be done.
> >
> > I hope this explains a bit
> >
> > Thanks,
> > Jacek
>
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-02 Thread David Capwell
> I apologize I did not mention those things explicitly. All the places where
> sstable files are accessed directly would have to be refactored.

Works for me

> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader.

I am a fan of ServiceLoader as a means of plugging in.

> I hope this explains a bit

Yep; thanks!

> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski  
> wrote:
> 
> David,
> 
> I apologize I did not mention those things explicitly. All the places where
> sstable files are accessed directly would have to be refactored.
> 
> Regarding TableMetrics - currently it includes many metrics, some of them
> are unrelated to sstables at all, but there are metrics which are specific
> to the current sstable format, like metrics related to index summaries or
> bloom filters. The created gauges query certain methods on sstable reader -
> I think the only common metrics for sstables we can leave in TableMetrics
> are those for which there are query methods in generic sstable interface.
> Other metrics, specific to the certain sstable format should be registered
> by the implementation itself.
> 
> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader. This way we could get the list of all the factories on the
> classpath and call some method, like `registerMetrics` during system
> initialization. That could be also implemented in static initializer in the
> factory but it would make it less obvious for the implementors where such
> initialization should be done.
> 
> I hope this explains a bit
> 
> Thanks,
> Jacek


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-02 Thread Jacek Lewandowski
David,

I apologize I did not mention those things explicitly. All the places where
sstable files are accessed directly would have to be refactored.

Regarding TableMetrics - currently it includes many metrics, some of them
are unrelated to sstables at all, but there are metrics which are specific
to the current sstable format, like metrics related to index summaries or
bloom filters. The created gauges query certain methods on sstable reader -
I think the only common metrics for sstables we can leave in TableMetrics
are those for which there are query methods in generic sstable interface.
Other metrics, specific to the certain sstable format should be registered
by the implementation itself.

Speaking about the implementation, one idea I was thinking about was that
the factories for formats are registered using Java's native service
loader. This way we could get the list of all the factories on the
classpath and call some method, like `registerMetrics` during system
initialization. That could be also implemented in static initializer in the
factory but it would make it less obvious for the implementors where such
initialization should be done.

I hope this explains a bit

Thanks,
Jacek


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-01 Thread David Capwell
Inline

> On Nov 1, 2021, at 9:23 AM, Branimir Lambov  wrote:
> 
> As Jacek is not a committer, this proposal needs a shepherd. I would be
> happy to take this role.
> 
>> to me the interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths
> 
> Could you give some detail on what these things are? Are they something
> different from what the standalone Cassandra tools (scrub/verify/upgrade)
> are currently doing? Obviously, any pluggability proposal will have to
> provide a solution to these, and it would be helpful to know what needs to
> be done beyond making sure the bundled tools work correctly (which includes
> iterating indexes; format-specific operations (e.g. index summary
> redistribution) are excluded as they are to be handled by the individual
> format).

Looking closer at compaction and repair I had forgotten that they were changed 
in CASSANDRA-15861 to go through the reader interface rather than directly 
mutate the files (concurrency bug).  I was thinking the logic which is now 
org.apache.cassandra.io.sstable.format.SSTableReader#mutateLevelAndReload and 
org.apache.cassandra.io.sstable.format.SSTableReader#mutateRepairedAndReload; 
so I believe compaction/repair may be ok with reader/writer; ignore those 
examples.

Checking usage of descriptor you find examples like

org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader#read - 
which calls: 
writer.descriptor.getMetadataSerializer().mutate(writer.descriptor, 
description, transform);
org.apache.cassandra.tools.Util#metadataFromSSTable - which is used by 
sstablemetadata tool
org.apache.cassandra.io.sstable.KeyIterator#KeyIterator - directly loads 
primary index from descriptor: new In(new 
File(desc.filenameFor(Component.PRIMARY_INDEX)));

Non of the examples I see couldn’t be rewritten to use read/writer; so relying 
on reader/writer as the main interfaces would work.

> 
> There is another problem in the current code alluded to in the question, in
> the fact that "SSTableReader" (tied to the sstable format and ready for
> querying data (i.e. with open data files and bloom filters loaded in
> memory)) is the only concept that the code uses to work with sstables. As I
> understand it, this proposal does not aim to solve that problem, only to
> make sure that we can properly read and write sstables of a given format,
> including in streaming and standalone tools. In other words, to provide the
> machinery to convert sstable descriptors into sstable readers and writers.
> 
> I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
> that came after it and broke the intended capability.
> 
> Regards,
> Branimir
> 
> On Thu, Oct 28, 2021 at 7:43 PM David Capwell 
> wrote:
> 
>> Sorry about that; used -1/+1 to show preference, not binding action
>> 
>>> On Oct 28, 2021, at 5:50 AM, [email protected] wrote:
>>> 
>>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> Be careful dropping -1s, as your -1s here are binding. I realise this
>> isn’t a vote thread, but the effect is the same. IMO we should try to
>> express our preferences and defer to the collective opinion where possible.
>> True -1s should very rarely appear.
>>> 
>>> 
>>> From: David Capwell 
>>> Date: Wednesday, 27 October 2021 at 15:33
>>> To: [email protected] 
>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> Reading the CEP I don’t see any mention to the systems which access
>> SSTables; such as streaming (small callout to zero-copy-streaming with
>> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
>> BigTableReader then you are not dealing with the implementation assumptions
>> that users of SSTables have (such as direct mutation to auxiliary files
>> outside of -Data.db).
>>> 
>>>> Audience
>>>>  • Cassandra developers who wish to see SSTableReader and
>> SSTableWriter more modular than they are today,
>>> 
>>> This statement relates to the above comment, many parts of the code do
>> not use Reader/Writer but instead use direct format knowledge to apply
>> changes to the file format (normally outside of -Data.db); to me the
>> interfaces has to be at the SSTable level, which then expose
>> readers/writers, but also has to expose the other things we do outside of
>> those paths.
>>>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-11-01 Thread Branimir Lambov
As Jacek is not a committer, this proposal needs a shepherd. I would be
happy to take this role.

> to me the interfaces has to be at the SSTable level, which then expose
readers/writers, but also has to expose the other things we do outside of
those paths

Could you give some detail on what these things are? Are they something
different from what the standalone Cassandra tools (scrub/verify/upgrade)
are currently doing? Obviously, any pluggability proposal will have to
provide a solution to these, and it would be helpful to know what needs to
be done beyond making sure the bundled tools work correctly (which includes
iterating indexes; format-specific operations (e.g. index summary
redistribution) are excluded as they are to be handled by the individual
format).

There is another problem in the current code alluded to in the question, in
the fact that "SSTableReader" (tied to the sstable format and ready for
querying data (i.e. with open data files and bloom filters loaded in
memory)) is the only concept that the code uses to work with sstables. As I
understand it, this proposal does not aim to solve that problem, only to
make sure that we can properly read and write sstables of a given format,
including in streaming and standalone tools. In other words, to provide the
machinery to convert sstable descriptors into sstable readers and writers.

I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
that came after it and broke the intended capability.

Regards,
Branimir

On Thu, Oct 28, 2021 at 7:43 PM David Capwell 
wrote:

> Sorry about that; used -1/+1 to show preference, not binding action
>
> > On Oct 28, 2021, at 5:50 AM, [email protected] wrote:
> >
> >> I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > Be careful dropping -1s, as your -1s here are binding. I realise this
> isn’t a vote thread, but the effect is the same. IMO we should try to
> express our preferences and defer to the collective opinion where possible.
> True -1s should very rarely appear.
> >
> >
> > From: David Capwell 
> > Date: Wednesday, 27 October 2021 at 15:33
> > To: [email protected] 
> > Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > Reading the CEP I don’t see any mention to the systems which access
> SSTables; such as streaming (small callout to zero-copy-streaming with
> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
> BigTableReader then you are not dealing with the implementation assumptions
> that users of SSTables have (such as direct mutation to auxiliary files
> outside of -Data.db).
> >
> >> Audience
> >>   • Cassandra developers who wish to see SSTableReader and
> SSTableWriter more modular than they are today,
> >
> > This statement relates to the above comment, many parts of the code do
> not use Reader/Writer but instead use direct format knowledge to apply
> changes to the file format (normally outside of -Data.db); to me the
> interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths.
> >
> >>   • move the metrics related to sstable format out from
> TableMetrics class and make them tied to certain sstable implementation
> >
> > I am curious about this comment, are you removing exposing this
> information?
> >
> >>   • have a single factory for creating both readers and writers for
> particular implementation of sstable and use it consistently - no direct
> creation of any reader / writer
> >
> > I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > I am +1 to the intent of the CEP.
> >
> > And last comment, which I have also done in the other modularity thread…
> backwards compatibility and maintenance. It is not clear right now what
> java interfaces may not break and how we can maintain and extend such
> interfaces in the future.  If the goal is to allow 3rd parties to plugin
> and offer new SSTable formats, are we as a project ok with having a minor
> release do a binary or source non-compatible change?  If not how do we
> detect this?  Until this problem is solved, I do not think we should add
> any such interfaces.
> >
> >> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan 
> wrote:
> >>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-28 Thread David Capwell
Sorry about that; used -1/+1 to show preference, not binding action

> On Oct 28, 2021, at 5:50 AM, [email protected] wrote:
> 
>> I am -1 here, for the reasons listed above; the problem (in my eye) is not 
>> reader/writer but higher level at the actual SSTable.  If we plug out 
>> read/write but still allow direct file access, then these abstractions fail 
>> to provide the goals of the CEP.
> 
> Be careful dropping -1s, as your -1s here are binding. I realise this isn’t a 
> vote thread, but the effect is the same. IMO we should try to express our 
> preferences and defer to the collective opinion where possible. True -1s 
> should very rarely appear.
> 
> 
> From: David Capwell 
> Date: Wednesday, 27 October 2021 at 15:33
> To: [email protected] 
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> Reading the CEP I don’t see any mention to the systems which access SSTables; 
> such as streaming (small callout to zero-copy-streaming with 
> ZeroCopyBigTableWriter) and repair.  If you are abstracting out 
> BigTableReader then you are not dealing with the implementation assumptions 
> that users of SSTables have (such as direct mutation to auxiliary files 
> outside of -Data.db).
> 
>> Audience
>>   • Cassandra developers who wish to see SSTableReader and SSTableWriter 
>> more modular than they are today,
> 
> This statement relates to the above comment, many parts of the code do not 
> use Reader/Writer but instead use direct format knowledge to apply changes to 
> the file format (normally outside of -Data.db); to me the interfaces has to 
> be at the SSTable level, which then expose readers/writers, but also has to 
> expose the other things we do outside of those paths.
> 
>>   • move the metrics related to sstable format out from TableMetrics 
>> class and make them tied to certain sstable implementation
> 
> I am curious about this comment, are you removing exposing this information?
> 
>>   • have a single factory for creating both readers and writers for 
>> particular implementation of sstable and use it consistently - no direct 
>> creation of any reader / writer
> 
> I am -1 here, for the reasons listed above; the problem (in my eye) is not 
> reader/writer but higher level at the actual SSTable.  If we plug out 
> read/write but still allow direct file access, then these abstractions fail 
> to provide the goals of the CEP.
> 
> I am +1 to the intent of the CEP.
> 
> And last comment, which I have also done in the other modularity thread… 
> backwards compatibility and maintenance. It is not clear right now what java 
> interfaces may not break and how we can maintain and extend such interfaces 
> in the future.  If the goal is to allow 3rd parties to plugin and offer new 
> SSTable formats, are we as a project ok with having a minor release do a 
> binary or source non-compatible change?  If not how do we detect this?  Until 
> this problem is solved, I do not think we should add any such interfaces.
> 
>> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan  
>> wrote:
>> 
>> Hi Stefan,
>> That idea is not related to this CEP which is about the file formats of the
>> sstables, not file system access.  But you should take a look at the work
>> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
>> to switch to using java.nio.file.Path for file access.  This should allow
>> the use of a file system provider to access files which could be the basis
>> for work to load the files from S3.
>> 
>> -Jeremiah
>> 
>> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
>> [email protected]> wrote:
>> 
>>> One point I would like to add to this; I was already looking into how
>>> to extend this but what I saw in SSTableReader was that it is very
>>> much "file system oriented". There was not any possibility to actually
>>> hook something like that there. I think what importing does is that it
>>> will use SSTableReader / Writer stuff so I think that the modification
>>> of these classes to accommodate this idea would be necessary.
>>> 
>>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>>  wrote:
>>>> 
>>>> Hi Jacek,
>>>> 
>>>> Thanks for taking the lead on this.
>>>> 
>>>> There was importing of SSTables introduced in 4.0 via
>>>> StorageService#importNewSSTables. The "problem" with this is that
>>>> SSTables need to be physically located at disk so Cassandra can read
>>>> them. If a backup is taken and SSTabl

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-28 Thread [email protected]
> I am -1 here, for the reasons listed above; the problem (in my eye) is not 
> reader/writer but higher level at the actual SSTable.  If we plug out 
> read/write but still allow direct file access, then these abstractions fail 
> to provide the goals of the CEP.

Be careful dropping -1s, as your -1s here are binding. I realise this isn’t a 
vote thread, but the effect is the same. IMO we should try to express our 
preferences and defer to the collective opinion where possible. True -1s should 
very rarely appear.


From: David Capwell 
Date: Wednesday, 27 October 2021 at 15:33
To: [email protected] 
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
Reading the CEP I don’t see any mention to the systems which access SSTables; 
such as streaming (small callout to zero-copy-streaming with 
ZeroCopyBigTableWriter) and repair.  If you are abstracting out BigTableReader 
then you are not dealing with the implementation assumptions that users of 
SSTables have (such as direct mutation to auxiliary files outside of -Data.db).

> Audience
>• Cassandra developers who wish to see SSTableReader and SSTableWriter 
> more modular than they are today,

This statement relates to the above comment, many parts of the code do not use 
Reader/Writer but instead use direct format knowledge to apply changes to the 
file format (normally outside of -Data.db); to me the interfaces has to be at 
the SSTable level, which then expose readers/writers, but also has to expose 
the other things we do outside of those paths.

>• move the metrics related to sstable format out from TableMetrics 
> class and make them tied to certain sstable implementation

I am curious about this comment, are you removing exposing this information?

>• have a single factory for creating both readers and writers for 
> particular implementation of sstable and use it consistently - no direct 
> creation of any reader / writer

I am -1 here, for the reasons listed above; the problem (in my eye) is not 
reader/writer but higher level at the actual SSTable.  If we plug out 
read/write but still allow direct file access, then these abstractions fail to 
provide the goals of the CEP.

I am +1 to the intent of the CEP.

And last comment, which I have also done in the other modularity thread… 
backwards compatibility and maintenance. It is not clear right now what java 
interfaces may not break and how we can maintain and extend such interfaces in 
the future.  If the goal is to allow 3rd parties to plugin and offer new 
SSTable formats, are we as a project ok with having a minor release do a binary 
or source non-compatible change?  If not how do we detect this?  Until this 
problem is solved, I do not think we should add any such interfaces.

> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan  
> wrote:
>
> Hi Stefan,
> That idea is not related to this CEP which is about the file formats of the
> sstables, not file system access.  But you should take a look at the work
> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
> to switch to using java.nio.file.Path for file access.  This should allow
> the use of a file system provider to access files which could be the basis
> for work to load the files from S3.
>
> -Jeremiah
>
> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> [email protected]> wrote:
>
>> One point I would like to add to this; I was already looking into how
>> to extend this but what I saw in SSTableReader was that it is very
>> much "file system oriented". There was not any possibility to actually
>> hook something like that there. I think what importing does is that it
>> will use SSTableReader / Writer stuff so I think that the modification
>> of these classes to accommodate this idea would be necessary.
>>
>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>  wrote:
>>>
>>> Hi Jacek,
>>>
>>> Thanks for taking the lead on this.
>>>
>>> There was importing of SSTables introduced in 4.0 via
>>> StorageService#importNewSSTables. The "problem" with this is that
>>> SSTables need to be physically located at disk so Cassandra can read
>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>> first and then imported. What about downloading them / importing them
>>> directly from S3? Or any custom source for that matter? Importing of
>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>> link / refresh, it is all handled internally.
>>>
>>> I am not sure if your work is related to this idea but I would
>>> appreciate it if this is

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-27 Thread David Capwell
Reading the CEP I don’t see any mention to the systems which access SSTables; 
such as streaming (small callout to zero-copy-streaming with 
ZeroCopyBigTableWriter) and repair.  If you are abstracting out BigTableReader 
then you are not dealing with the implementation assumptions that users of 
SSTables have (such as direct mutation to auxiliary files outside of -Data.db).

> Audience
>   • Cassandra developers who wish to see SSTableReader and SSTableWriter 
> more modular than they are today,

This statement relates to the above comment, many parts of the code do not use 
Reader/Writer but instead use direct format knowledge to apply changes to the 
file format (normally outside of -Data.db); to me the interfaces has to be at 
the SSTable level, which then expose readers/writers, but also has to expose 
the other things we do outside of those paths.  

>   • move the metrics related to sstable format out from TableMetrics 
> class and make them tied to certain sstable implementation

I am curious about this comment, are you removing exposing this information?

>   • have a single factory for creating both readers and writers for 
> particular implementation of sstable and use it consistently - no direct 
> creation of any reader / writer

I am -1 here, for the reasons listed above; the problem (in my eye) is not 
reader/writer but higher level at the actual SSTable.  If we plug out 
read/write but still allow direct file access, then these abstractions fail to 
provide the goals of the CEP.

I am +1 to the intent of the CEP.

And last comment, which I have also done in the other modularity thread… 
backwards compatibility and maintenance. It is not clear right now what java 
interfaces may not break and how we can maintain and extend such interfaces in 
the future.  If the goal is to allow 3rd parties to plugin and offer new 
SSTable formats, are we as a project ok with having a minor release do a binary 
or source non-compatible change?  If not how do we detect this?  Until this 
problem is solved, I do not think we should add any such interfaces.

> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan  
> wrote:
> 
> Hi Stefan,
> That idea is not related to this CEP which is about the file formats of the
> sstables, not file system access.  But you should take a look at the work
> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
> to switch to using java.nio.file.Path for file access.  This should allow
> the use of a file system provider to access files which could be the basis
> for work to load the files from S3.
> 
> -Jeremiah
> 
> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> [email protected]> wrote:
> 
>> One point I would like to add to this; I was already looking into how
>> to extend this but what I saw in SSTableReader was that it is very
>> much "file system oriented". There was not any possibility to actually
>> hook something like that there. I think what importing does is that it
>> will use SSTableReader / Writer stuff so I think that the modification
>> of these classes to accommodate this idea would be necessary.
>> 
>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>  wrote:
>>> 
>>> Hi Jacek,
>>> 
>>> Thanks for taking the lead on this.
>>> 
>>> There was importing of SSTables introduced in 4.0 via
>>> StorageService#importNewSSTables. The "problem" with this is that
>>> SSTables need to be physically located at disk so Cassandra can read
>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>> first and then imported. What about downloading them / importing them
>>> directly from S3? Or any custom source for that matter? Importing of
>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>> link / refresh, it is all handled internally.
>>> 
>>> I am not sure if your work is related to this idea but I would
>>> appreciate it if this is pluggable as well for the sake of simplicity
>>> and effectiveness as we would not have to download all sstables before
>>> importing them.
>>> 
>>> If it is not related, feel free to skip that completely and I guess I
>>> would have to try to push that forward myself.
>>> 
>>> Regards
>>> 
>>> 
>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>>  wrote:
 
 I'd like to start a discussion about SSTable format API proposal
>> (CEP-17)
 
 Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
 CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
 
 Thanks,
 Jacek
 
 -
 To unsubscribe, e-mail: [email protected]
 For additional commands, e-mail: [email protected]
 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandr

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-22 Thread Jeremiah Jordan
Hi Stefan,
That idea is not related to this CEP which is about the file formats of the
sstables, not file system access.  But you should take a look at the work
recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
to switch to using java.nio.file.Path for file access.  This should allow
the use of a file system provider to access files which could be the basis
for work to load the files from S3.

-Jeremiah

On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
[email protected]> wrote:

> One point I would like to add to this; I was already looking into how
> to extend this but what I saw in SSTableReader was that it is very
> much "file system oriented". There was not any possibility to actually
> hook something like that there. I think what importing does is that it
> will use SSTableReader / Writer stuff so I think that the modification
> of these classes to accommodate this idea would be necessary.
>
> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>  wrote:
> >
> > Hi Jacek,
> >
> > Thanks for taking the lead on this.
> >
> > There was importing of SSTables introduced in 4.0 via
> > StorageService#importNewSSTables. The "problem" with this is that
> > SSTables need to be physically located at disk so Cassandra can read
> > them. If a backup is taken and SSTables are uploaded to, for example,
> > S3 bucket, then upon restore, all these SSTables need to be downloaded
> > first and then imported. What about downloading them / importing them
> > directly from S3? Or any custom source for that matter? Importing of
> > SSTables is a very nice feature in 4.0, we do not need to copy / hard
> > link / refresh, it is all handled internally.
> >
> > I am not sure if your work is related to this idea but I would
> > appreciate it if this is pluggable as well for the sake of simplicity
> > and effectiveness as we would not have to download all sstables before
> > importing them.
> >
> > If it is not related, feel free to skip that completely and I guess I
> > would have to try to push that forward myself.
> >
> > Regards
> >
> >
> > On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> >  wrote:
> > >
> > > I'd like to start a discussion about SSTable format API proposal
> (CEP-17)
> > >
> > > Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> > > CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> > >
> > > Thanks,
> > > Jacek
> > >
> > > -
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-22 Thread Stefan Miklosovic
One point I would like to add to this; I was already looking into how
to extend this but what I saw in SSTableReader was that it is very
much "file system oriented". There was not any possibility to actually
hook something like that there. I think what importing does is that it
will use SSTableReader / Writer stuff so I think that the modification
of these classes to accommodate this idea would be necessary.

On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
 wrote:
>
> Hi Jacek,
>
> Thanks for taking the lead on this.
>
> There was importing of SSTables introduced in 4.0 via
> StorageService#importNewSSTables. The "problem" with this is that
> SSTables need to be physically located at disk so Cassandra can read
> them. If a backup is taken and SSTables are uploaded to, for example,
> S3 bucket, then upon restore, all these SSTables need to be downloaded
> first and then imported. What about downloading them / importing them
> directly from S3? Or any custom source for that matter? Importing of
> SSTables is a very nice feature in 4.0, we do not need to copy / hard
> link / refresh, it is all handled internally.
>
> I am not sure if your work is related to this idea but I would
> appreciate it if this is pluggable as well for the sake of simplicity
> and effectiveness as we would not have to download all sstables before
> importing them.
>
> If it is not related, feel free to skip that completely and I guess I
> would have to try to push that forward myself.
>
> Regards
>
>
> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>  wrote:
> >
> > I'd like to start a discussion about SSTable format API proposal (CEP-17)
> >
> > Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> > CEP: 
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> >
> > Thanks,
> > Jacek
> >
> > -
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-22 Thread Stefan Miklosovic
Hi Jacek,

Thanks for taking the lead on this.

There was importing of SSTables introduced in 4.0 via
StorageService#importNewSSTables. The "problem" with this is that
SSTables need to be physically located at disk so Cassandra can read
them. If a backup is taken and SSTables are uploaded to, for example,
S3 bucket, then upon restore, all these SSTables need to be downloaded
first and then imported. What about downloading them / importing them
directly from S3? Or any custom source for that matter? Importing of
SSTables is a very nice feature in 4.0, we do not need to copy / hard
link / refresh, it is all handled internally.

I am not sure if your work is related to this idea but I would
appreciate it if this is pluggable as well for the sake of simplicity
and effectiveness as we would not have to download all sstables before
importing them.

If it is not related, feel free to skip that completely and I guess I
would have to try to push that forward myself.

Regards


On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
 wrote:
>
> I'd like to start a discussion about SSTable format API proposal (CEP-17)
>
> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> CEP: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>
> Thanks,
> Jacek
>
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]