Re: [DISCUSS] Nested YAML configs for new features

2021-12-06 Thread Ekaterina Dimitrova
 Please find my comments inline below

* Good with nested configs - the question is how they will be introduced
and maintained? I wouldn't advocate for maintaining more than yaml file but
probably as you once mentioned some time ago (if I remember correctly) -
having one format as default and just documenting the support for the other
one/ones. Now which is the default one is a different topic.
* Where/How we group is an open question, maybe we move this to a JIRA as
follow up work to CASSANDRA-15234? - not part of CASSANDRA-15234 as per all
the discussions, already in review (thank you for your first quick round
btw, appreciate it!)

Spoke with Ekaterina about this, and not solved in 15234; let's move this
to a follow up JIRA for 15234? - For the broader audience, currently what I
solve around naming in CASSANDRA-15234 is removing the unit suffix and
moving to the format noun_verb the config parameters names. After all
discussions and realizing the great interest and variety of opinions, I
tried really to split more tickets from CASSANDRA-15234 and to keep
primarily the new custom types and the new framework with backward
compatibility as the main body of work, good also for the reviewers. Last
year I came up with the idea of reorganizing the config file a bit which
led to discussions. So considering my previous point about splitting to a
more incremental approach considering the variety of opinions, I suggested
when submitting for review to open a new ticket for that new organization
of our config file. Probably we can add the abort/fail and any other
similar concerns/questions there post CASSANDRA-15234?


On Fri, 3 Dec 2021 at 13:34, David Capwell 
wrote:

> Thanks everyone for the feedback!  If I am reading this properly I am
> seeing the following
>
> * Good with nested configs
> * Good with YAML layer supporting flat structure (possible foo.bar.baz for
> the path foo: {bar: {baz: 42}}), how this relates with Settings table
> should be resolved, but there is a open ticket for this (enhance our YAML
> CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
> * Where/How we group is an open question, maybe we move this to a JIRA as
> follow up work to CASSANDRA-15234?
>
> > We’re also mixing terminology already, with limits/thresholds and
> fail/abort.
>
> Spoke with Ekaterina about this, and not solved in 15234; lets move this
> to a follow up JIRA for 15234?
>
> > On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova 
> wrote:
> >
> > Thank you for confirming as I misread your email at first :-)
> > I had a chat with David last week and I don’t think his plan is reworking
> > of 15234 but incremental improvements on top of it.
> > Regarding config, after spending time cleaning around and looking more
> into
> > detail my only appeal is:
> > - Centralized management and not 5 places to change things when you add
> new
> > config so we are less error-prone
> > - Documenting things for people who add new config or for our users (I
> > promised and I will do it for 15234 but it will be good to continue doing
> > it with any further changes down the road)
> > - be careful with breaking changes
> >
> > Thank you
> > Ekaterina
> >
> > On Tue, 30 Nov 2021 at 8:59, bened...@apache.org 
> > wrote:
> >
> >> I mean that it has been waiting for months, is ready to go, and I don’t
> >> want to hold you up any longer.
> >>
> >> From: Ekaterina Dimitrova 
> >> Date: Tuesday, 30 November 2021 at 13:44
> >> To: dev@cassandra.apache.org 
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> “
> >> IMO 15234 has sailed – it’s been held up for a long time, and was
> brought
> >> to this list for discussion with no engagement. Ekaterina is long
> overdue
> >> being able to commit her work. “
> >>
> >>
> >> Sailed? I submitted the patch a week ago for review. Not sure how to
> >> understand this statement. Can elaborate, please?
> >>
> >> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
> >> wrote:
> >>
> >>> The problem with scoping this to “features” is that we end up with at
> >> best
> >>> local coherence. The config file as a whole will end up just as
> >> incoherent
> >>> through its design evolution as it has historically.
> >>>
> >>> If you take a look at my proposed layout for the overall config, there
> is
> >>> a “limits” section that specifies thresholds for reporting warnings and
> >>> errors for various scenario. In this case, we probably don’t also want
> >>> per-feature limits? We’re also mixing term

Re: [DISCUSS] Nested YAML configs for new features

2021-12-03 Thread David Capwell
Thanks everyone for the feedback!  If I am reading this properly I am seeing 
the following

* Good with nested configs
* Good with YAML layer supporting flat structure (possible foo.bar.baz for the 
path foo: {bar: {baz: 42}}), how this relates with Settings table should be 
resolved, but there is a open ticket for this (enhance our YAML 
CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
* Where/How we group is an open question, maybe we move this to a JIRA as 
follow up work to CASSANDRA-15234?

> We’re also mixing terminology already, with limits/thresholds and fail/abort.

Spoke with Ekaterina about this, and not solved in 15234; lets move this to a 
follow up JIRA for 15234?

> On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova  
> wrote:
> 
> Thank you for confirming as I misread your email at first :-)
> I had a chat with David last week and I don’t think his plan is reworking
> of 15234 but incremental improvements on top of it.
> Regarding config, after spending time cleaning around and looking more into
> detail my only appeal is:
> - Centralized management and not 5 places to change things when you add new
> config so we are less error-prone
> - Documenting things for people who add new config or for our users (I
> promised and I will do it for 15234 but it will be good to continue doing
> it with any further changes down the road)
> - be careful with breaking changes
> 
> Thank you
> Ekaterina
> 
> On Tue, 30 Nov 2021 at 8:59, bened...@apache.org 
> wrote:
> 
>> I mean that it has been waiting for months, is ready to go, and I don’t
>> want to hold you up any longer.
>> 
>> From: Ekaterina Dimitrova 
>> Date: Tuesday, 30 November 2021 at 13:44
>> To: dev@cassandra.apache.org 
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> “
>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>> to this list for discussion with no engagement. Ekaterina is long overdue
>> being able to commit her work. “
>> 
>> 
>> Sailed? I submitted the patch a week ago for review. Not sure how to
>> understand this statement. Can elaborate, please?
>> 
>> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
>> wrote:
>> 
>>> The problem with scoping this to “features” is that we end up with at
>> best
>>> local coherence. The config file as a whole will end up just as
>> incoherent
>>> through its design evolution as it has historically.
>>> 
>>> If you take a look at my proposed layout for the overall config, there is
>>> a “limits” section that specifies thresholds for reporting warnings and
>>> errors for various scenario. In this case, we probably don’t also want
>>> per-feature limits? We’re also mixing terminology already, with
>>> limits/thresholds and fail/abort.
>>> 
>>> It’s a lot of work to come up with a coherent and intuitive config
>> layout.
>>> We probably want to at least create some documentation in-tree
>> stipulating
>>> terminology with respect to plurals, verbs/nouns, and specific terms
>>> (period, abort, limit, datacenter vs dc, etc), but ideally we would have
>> a
>>> common end goal for the config file.
>>> 
>>>> leave non-features to CASSANDRA-15234
>>> 
>>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>>> to this list for discussion with no engagement. Ekaterina is long overdue
>>> being able to commit her work.
>>> 
>>> 
>>> From: David Capwell 
>>> Date: Monday, 29 November 2021 at 23:44
>>> To: dev@cassandra.apache.org 
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>> but I would hate to repeat the mistakes of our past by evolving the
>>> config in a new direction without any coherent overarching design.
>>> 
>>> At the start I asked to keep the thread local to new features, but to
>> more
>>> flesh out an “overarching design” maybe we should increase the “desired”
>>> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
>>> Standardise config and JVM parameters)?  Aka, do we think the following
>> is
>>> more ideal (configs scoped to a feature)
>>> 
>>> hinted_handoff:
>>>  enabled: true
>>>  disabled_datacenters:
>>>- DC1
>>>- DC2
>>>  max_window: 3h
>>>  flush_period: 10s
>>>  max_file_size: 128mb
>>>  compression:
>>>class_name: LZ4Compressor
>>>parameters:
>>>  a: b
>>> 
>>> trac

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread Ekaterina Dimitrova
Thank you for confirming as I misread your email at first :-)
I had a chat with David last week and I don’t think his plan is reworking
of 15234 but incremental improvements on top of it.
Regarding config, after spending time cleaning around and looking more into
detail my only appeal is:
- Centralized management and not 5 places to change things when you add new
config so we are less error-prone
- Documenting things for people who add new config or for our users (I
promised and I will do it for 15234 but it will be good to continue doing
 it with any further changes down the road)
- be careful with breaking changes

Thank you
Ekaterina

On Tue, 30 Nov 2021 at 8:59, bened...@apache.org 
wrote:

> I mean that it has been waiting for months, is ready to go, and I don’t
> want to hold you up any longer.
>
> From: Ekaterina Dimitrova 
> Date: Tuesday, 30 November 2021 at 13:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> “
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work. “
>
>
>  Sailed? I submitted the patch a week ago for review. Not sure how to
> understand this statement. Can elaborate, please?
>
> On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
> wrote:
>
> > The problem with scoping this to “features” is that we end up with at
> best
> > local coherence. The config file as a whole will end up just as
> incoherent
> > through its design evolution as it has historically.
> >
> > If you take a look at my proposed layout for the overall config, there is
> > a “limits” section that specifies thresholds for reporting warnings and
> > errors for various scenario. In this case, we probably don’t also want
> > per-feature limits? We’re also mixing terminology already, with
> > limits/thresholds and fail/abort.
> >
> > It’s a lot of work to come up with a coherent and intuitive config
> layout.
> > We probably want to at least create some documentation in-tree
> stipulating
> > terminology with respect to plurals, verbs/nouns, and specific terms
> > (period, abort, limit, datacenter vs dc, etc), but ideally we would have
> a
> > common end goal for the config file.
> >
> > > leave non-features to CASSANDRA-15234
> >
> > IMO 15234 has sailed – it’s been held up for a long time, and was brought
> > to this list for discussion with no engagement. Ekaterina is long overdue
> > being able to commit her work.
> >
> >
> > From: David Capwell 
> > Date: Monday, 29 November 2021 at 23:44
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >  but I would hate to repeat the mistakes of our past by evolving the
> > config in a new direction without any coherent overarching design.
> >
> > At the start I asked to keep the thread local to new features, but to
> more
> > flesh out an “overarching design” maybe we should increase the “desired”
> > scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> > Standardise config and JVM parameters)?  Aka, do we think the following
> is
> > more ideal (configs scoped to a feature)
> >
> > hinted_handoff:
> >   enabled: true
> >   disabled_datacenters:
> > - DC1
> > - DC2
> >   max_window: 3h
> >   flush_period: 10s
> >   max_file_size: 128mb
> >   compression:
> > class_name: LZ4Compressor
> > parameters:
> >   a: b
> >
> > track_warnings:
> >   enabled: true
> >   local_read_size:
> > warn_threshold: 1mb
> > abort_threshold: 10mb
> >   coordinator_read_size:
> > warn_threshold: 5mb
> > abort_threshold: 20mb
> >
> >
> > OR
> >
> > # I had to rename hint configs as there was 0 consistent naming
> > hinted_handoff_enabled: true
> > hinted_handoff_disabled_datacenters:
> >   - 'DC1'
> >   - 'DC2'
> > hinted_handoff_max_window: 3h
> > hinted_handoff_max_file_size: 128mb
> > hinted_handoff_flush_period: 10s
> > hinted_handoff_compression:
> >   class_name: LZ4Compressor
> >   parameters:
> > a: b
> >
> > track_warnings_enabled: true
> > track_warnings_local_read_size_warn_threshold: 1mb
> > track_warnings_local_read_size_abort_threshold: 10mb
> > track_warnings_coordinator_read_size_warn_threshold: 5mb
> > track_warnings_coordinator_read_size_abort_threshold: 20mb
> >
> >
> > The main issue I have with flat structure is that we 

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread bened...@apache.org
I mean that it has been waiting for months, is ready to go, and I don’t want to 
hold you up any longer.

From: Ekaterina Dimitrova 
Date: Tuesday, 30 November 2021 at 13:44
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
> - DC1
> - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
> class_name: LZ4Compressor
> parameters:
>   a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
> warn_threshold: 1mb
> abort_threshold: 10mb
>   coordinator_read_size:
> warn_threshold: 5mb
> abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
> a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In cas

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread Ekaterina Dimitrova
“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, bened...@apache.org 
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
> - DC1
> - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
> class_name: LZ4Compressor
> parameters:
>   a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
> warn_threshold: 1mb
> abort_threshold: 10mb
>   coordinator_read_size:
> warn_threshold: 5mb
> abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
> a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific a

Re: [DISCUSS] Nested YAML configs for new features

2021-11-30 Thread bened...@apache.org
The problem with scoping this to “features” is that we end up with at best 
local coherence. The config file as a whole will end up just as incoherent 
through its design evolution as it has historically.

If you take a look at my proposed layout for the overall config, there is a 
“limits” section that specifies thresholds for reporting warnings and errors 
for various scenario. In this case, we probably don’t also want per-feature 
limits? We’re also mixing terminology already, with limits/thresholds and 
fail/abort.

It’s a lot of work to come up with a coherent and intuitive config layout. We 
probably want to at least create some documentation in-tree stipulating 
terminology with respect to plurals, verbs/nouns, and specific terms (period, 
abort, limit, datacenter vs dc, etc), but ideally we would have a common end 
goal for the config file.

> leave non-features to CASSANDRA-15234

IMO 15234 has sailed – it’s been held up for a long time, and was brought to 
this list for discussion with no engagement. Ekaterina is long overdue being 
able to commit her work.


From: David Capwell 
Date: Monday, 29 November 2021 at 23:44
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
>  but I would hate to repeat the mistakes of our past by evolving the config 
> in a new direction without any coherent overarching design.

At the start I asked to keep the thread local to new features, but to more 
flesh out an “overarching design” maybe we should increase the “desired” scope 
to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config 
and JVM parameters)?  Aka, do we think the following is more ideal (configs 
scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
- DC1
- DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
class_name: LZ4Compressor
parameters:
  a: b

track_warnings:
  enabled: true
  local_read_size:
warn_threshold: 1mb
abort_threshold: 10mb
  coordinator_read_size:
warn_threshold: 5mb
abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce 
standard naming; if you look at the hint example there were at least 3 naming 
conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain 
that?).  And one of the core reasons track_warnings went nested was that 
warn/abort some times became warn/fail and threshold some times was 
thresholds…. By embracing nested structure we can actually enforce consistency, 
with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well 
(PR in CASSANDRA-17166 shows this working) if users desire it; so we get the 
consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
>
> If we’re thinking of moving towards nested configuration, then before 
> employing the approach further we would ideally consider what a fully nested 
> config looks like for the project. Ekaterina has done a lot to clean up 
> inconsistent naming, but I would hate to repeat the mistakes of our past by 
> evolving the config in a new direction without any coherent overarching 
> design.
>
> In case anyone missed it in the earlier discussion, this was my attempt to 
> prototype a nested config: 
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
>
> I don’t have any specific attachment to it, but settling on some approximate 
> scheme would be helpful IMO.
>
> From: David Capwell 
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  
>> Currently default shows nested
>
> Was told this statement was confusing, so trying to clarify.  At the moment 
> we do not allow a nested config to be expressed in any way outside of nesting 
> it (excluding YAML’s ability to inline objects), so if we did allow flat 
> config representation of nested configs, then this would be a brand new 
> feature; we currently show the nested structure in cassandra.yaml
>
>> On Nov 29, 2021, a

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread David Capwell
>  but I would hate to repeat the mistakes of our past by evolving the config 
> in a new direction without any coherent overarching design.

At the start I asked to keep the thread local to new features, but to more 
flesh out an “overarching design” maybe we should increase the “desired” scope 
to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config 
and JVM parameters)?  Aka, do we think the following is more ideal (configs 
scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
- DC1
- DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
class_name: LZ4Compressor
parameters:
  a: b

track_warnings:
  enabled: true
  local_read_size:
warn_threshold: 1mb
abort_threshold: 10mb
  coordinator_read_size:
warn_threshold: 5mb
abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce 
standard naming; if you look at the hint example there were at least 3 naming 
conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain 
that?).  And one of the core reasons track_warnings went nested was that 
warn/abort some times became warn/fail and threshold some times was 
thresholds…. By embracing nested structure we can actually enforce consistency, 
with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well 
(PR in CASSANDRA-17166 shows this working) if users desire it; so we get the 
consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> 
> If we’re thinking of moving towards nested configuration, then before 
> employing the approach further we would ideally consider what a fully nested 
> config looks like for the project. Ekaterina has done a lot to clean up 
> inconsistent naming, but I would hate to repeat the mistakes of our past by 
> evolving the config in a new direction without any coherent overarching 
> design.
> 
> In case anyone missed it in the earlier discussion, this was my attempt to 
> prototype a nested config: 
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> 
> I don’t have any specific attachment to it, but settling on some approximate 
> scheme would be helpful IMO.
> 
> From: David Capwell 
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  
>> Currently default shows nested
> 
> Was told this statement was confusing, so trying to clarify.  At the moment 
> we do not allow a nested config to be expressed in any way outside of nesting 
> it (excluding YAML’s ability to inline objects), so if we did allow flat 
> config representation of nested configs, then this would be a brand new 
> feature; we currently show the nested structure in cassandra.yaml
> 
>> On Nov 29, 2021, at 11:58 AM, David Capwell  
>> wrote:
>> 
>> Thanks everyone for the comments, I hope below is a good summary of all the 
>> talking points?
>> 
>> We already use nested configs (networking, seed provider, commit log/hint 
>> compression, back pressure, etc.)
>> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
>> It would be possible to support flat versions of our configs in 
>> cassandra.yaml (in addition to the nested versions)
>> "Settings" vtable currently uses the "_" separator (example of 
>> encryption/audit log).  Switching to "." Would be a change in behavior which 
>> may impact some users
>> "." Separator for nested configs are common in other systems (yq, elastic 
>> search, etc.)
>> "Structured / nested config is easier for human eyes to read"... "Flat 
>> config is harder for human eyes but easy for simple scripts"
>> For learning what configs are enabled, cassandra.yaml isn't the best 
>> interface as it may not reflect the actual configs; we can better expose 
>> this in CQL and/or Sidec

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread bened...@apache.org
If we’re thinking of moving towards nested configuration, then before employing 
the approach further we would ideally consider what a fully nested config looks 
like for the project. Ekaterina has done a lot to clean up inconsistent naming, 
but I would hate to repeat the mistakes of our past by evolving the config in a 
new direction without any coherent overarching design.

In case anyone missed it in the earlier discussion, this was my attempt to 
prototype a nested config: 
https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml

I don’t have any specific attachment to it, but settling on some approximate 
scheme would be helpful IMO.

From: David Capwell 
Date: Monday, 29 November 2021 at 20:38
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
> What should our default example cassandra.yaml file use (flat or nested)?  
> Currently default shows nested

Was told this statement was confusing, so trying to clarify.  At the moment we 
do not allow a nested config to be expressed in any way outside of nesting it 
(excluding YAML’s ability to inline objects), so if we did allow flat config 
representation of nested configs, then this would be a brand new feature; we 
currently show the nested structure in cassandra.yaml

> On Nov 29, 2021, at 11:58 AM, David Capwell  
> wrote:
>
> Thanks everyone for the comments, I hope below is a good summary of all the 
> talking points?
>
> We already use nested configs (networking, seed provider, commit log/hint 
> compression, back pressure, etc.)
> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
> It would be possible to support flat versions of our configs in 
> cassandra.yaml (in addition to the nested versions)
> "Settings" vtable currently uses the "_" separator (example of 
> encryption/audit log).  Switching to "." Would be a change in behavior which 
> may impact some users
> "." Separator for nested configs are common in other systems (yq, elastic 
> search, etc.)
> "Structured / nested config is easier for human eyes to read"... "Flat config 
> is harder for human eyes but easy for simple scripts"
> For learning what configs are enabled, cassandra.yaml isn't the best 
> interface as it may not reflect the actual configs; we can better expose this 
> in CQL and/or Sidecar
> What should our default example cassandra.yaml file use (flat or nested)?  
> Currently default shows nested
> When projecting the Config into CQL, we may want to consider UDTs to 
> represent the complex types
> Current limitations in CQL make nested structures hard to work with, it may 
> be worth wild to expand CQL support for nested structures.
>
> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be 
> reusable outside of yaml parsing, 2) support setters (we currently do, but 
> setters must be snake case… I fixed that)…, 3) support both nested and 
> structured, 4) support ignoring fields in a consistent way (Settings vtable 
> will include things SnakeYAML won’t and visa-versa).
>
> https://github.com/apache/cassandra/pull/1335 
> <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final 
> ready to merge thing, but instead a POC to show how we can solve a lot of the 
> core problems in a consistent and reusable manner.
>
> The following cassandra.yaml was used to show both worlds would work fine in 
> the config (and compliment each other)
>
> track_warnings:
>  enabled: true
>  # nested relative to the local level (TrackWarnings)
>  coordinator_read_size.warn_threshold_kb: 1024
>  local_read_size.abort_threshold_kb: 1024
>  row_index_size:
>warn_threshold_kb: 1024
>abort_threshold_kb: 1024
> # nested relative to the top level
> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>
> For the “Settings” vtable, a new Loader interface was added to get all the 
> properties, and Properties.flatten would turn every property into a “flatten” 
> version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  
> This doesn’t solve 100% of the issues that vtable has (types such as Duration 
> would need additional translation as they are Scalar but need a translation 
> from String -> Duration), and doesn’t solve the fact the table currently uses 
> “_”.
>
>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
>>
>> I meant to imply we should improve our UDT usability to support this kind of 
>> querying, essentially – but that if we support a simple text->property setup 
>> we might want to offer LIKE support so we can search them (via simple 
>> filtering, not any index) – which is actually 

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread David Capwell
> What should our default example cassandra.yaml file use (flat or nested)?  
> Currently default shows nested

Was told this statement was confusing, so trying to clarify.  At the moment we 
do not allow a nested config to be expressed in any way outside of nesting it 
(excluding YAML’s ability to inline objects), so if we did allow flat config 
representation of nested configs, then this would be a brand new feature; we 
currently show the nested structure in cassandra.yaml

> On Nov 29, 2021, at 11:58 AM, David Capwell  
> wrote:
> 
> Thanks everyone for the comments, I hope below is a good summary of all the 
> talking points?
> 
> We already use nested configs (networking, seed provider, commit log/hint 
> compression, back pressure, etc.)
> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
> It would be possible to support flat versions of our configs in 
> cassandra.yaml (in addition to the nested versions)
> "Settings" vtable currently uses the "_" separator (example of 
> encryption/audit log).  Switching to "." Would be a change in behavior which 
> may impact some users
> "." Separator for nested configs are common in other systems (yq, elastic 
> search, etc.)
> "Structured / nested config is easier for human eyes to read"... "Flat config 
> is harder for human eyes but easy for simple scripts"
> For learning what configs are enabled, cassandra.yaml isn't the best 
> interface as it may not reflect the actual configs; we can better expose this 
> in CQL and/or Sidecar
> What should our default example cassandra.yaml file use (flat or nested)?  
> Currently default shows nested
> When projecting the Config into CQL, we may want to consider UDTs to 
> represent the complex types
> Current limitations in CQL make nested structures hard to work with, it may 
> be worth wild to expand CQL support for nested structures.
> 
> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be 
> reusable outside of yaml parsing, 2) support setters (we currently do, but 
> setters must be snake case… I fixed that)…, 3) support both nested and 
> structured, 4) support ignoring fields in a consistent way (Settings vtable 
> will include things SnakeYAML won’t and visa-versa).
> 
> https://github.com/apache/cassandra/pull/1335 
> <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final 
> ready to merge thing, but instead a POC to show how we can solve a lot of the 
> core problems in a consistent and reusable manner.
> 
> The following cassandra.yaml was used to show both worlds would work fine in 
> the config (and compliment each other)
> 
> track_warnings:
>  enabled: true
>  # nested relative to the local level (TrackWarnings)
>  coordinator_read_size.warn_threshold_kb: 1024
>  local_read_size.abort_threshold_kb: 1024
>  row_index_size:
>warn_threshold_kb: 1024
>abort_threshold_kb: 1024
> # nested relative to the top level
> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> 
> For the “Settings” vtable, a new Loader interface was added to get all the 
> properties, and Properties.flatten would turn every property into a “flatten” 
> version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  
> This doesn’t solve 100% of the issues that vtable has (types such as Duration 
> would need additional translation as they are Scalar but need a translation 
> from String -> Duration), and doesn’t solve the fact the table currently uses 
> “_”.
> 
>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
>> 
>> I meant to imply we should improve our UDT usability to support this kind of 
>> querying, essentially – but that if we support a simple text->property setup 
>> we might want to offer LIKE support so we can search them (via simple 
>> filtering, not any index) – which is actually pretty easy to provide.
>> 
>> I think we should aim to provide users all the facilities they need to 
>> interact with config via vtables. If the user requires external tooling, it 
>> suggests a weakness in CQL that we should address, and maybe help the user 
>> in other scenario too…
>> 
>> From: Joseph Lynch 
>> Date: Monday, 29 November 2021 at 17:32
>> To: dev@cassandra.apache.org 
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
>>  wrote:
>>> 
>>> Maybe we can make our query language more expressive 
>>> 
>>> We might anyway want to introduce e.g. a LIKE filtering option to 
>>> find/discover flattened config parameters?
>> 
>> This sounds mor

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread David Capwell
Thanks everyone for the comments, I hope below is a good summary of all the 
talking points?

We already use nested configs (networking, seed provider, commit log/hint 
compression, back pressure, etc.)
Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
It would be possible to support flat versions of our configs in cassandra.yaml 
(in addition to the nested versions)
"Settings" vtable currently uses the "_" separator (example of encryption/audit 
log).  Switching to "." Would be a change in behavior which may impact some 
users
"." Separator for nested configs are common in other systems (yq, elastic 
search, etc.)
"Structured / nested config is easier for human eyes to read"... "Flat config 
is harder for human eyes but easy for simple scripts"
For learning what configs are enabled, cassandra.yaml isn't the best interface 
as it may not reflect the actual configs; we can better expose this in CQL 
and/or Sidecar
What should our default example cassandra.yaml file use (flat or nested)?  
Currently default shows nested
When projecting the Config into CQL, we may want to consider UDTs to represent 
the complex types
Current limitations in CQL make nested structures hard to work with, it may be 
worth wild to expand CQL support for nested structures.

I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be 
reusable outside of yaml parsing, 2) support setters (we currently do, but 
setters must be snake case… I fixed that)…, 3) support both nested and 
structured, 4) support ignoring fields in a consistent way (Settings vtable 
will include things SnakeYAML won’t and visa-versa).

https://github.com/apache/cassandra/pull/1335 
<https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready 
to merge thing, but instead a POC to show how we can solve a lot of the core 
problems in a consistent and reusable manner.

The following cassandra.yaml was used to show both worlds would work fine in 
the config (and compliment each other)

track_warnings:
  enabled: true
  # nested relative to the local level (TrackWarnings)
  coordinator_read_size.warn_threshold_kb: 1024
  local_read_size.abort_threshold_kb: 1024
  row_index_size:
warn_threshold_kb: 1024
abort_threshold_kb: 1024
# nested relative to the top level
track_warnings.coordinator_read_size.abort_threshold_kb: 42

For the “Settings” vtable, a new Loader interface was added to get all the 
properties, and Properties.flatten would turn every property into a “flatten” 
version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This 
doesn’t solve 100% of the issues that vtable has (types such as Duration would 
need additional translation as they are Scalar but need a translation from 
String -> Duration), and doesn’t solve the fact the table currently uses “_”.

> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
> 
> I meant to imply we should improve our UDT usability to support this kind of 
> querying, essentially – but that if we support a simple text->property setup 
> we might want to offer LIKE support so we can search them (via simple 
> filtering, not any index) – which is actually pretty easy to provide.
> 
> I think we should aim to provide users all the facilities they need to 
> interact with config via vtables. If the user requires external tooling, it 
> suggests a weakness in CQL that we should address, and maybe help the user in 
> other scenario too…
> 
> From: Joseph Lynch 
> Date: Monday, 29 November 2021 at 17:32
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
>  wrote:
>> 
>> Maybe we can make our query language more expressive 
>> 
>> We might anyway want to introduce e.g. a LIKE filtering option to 
>> find/discover flattened config parameters?
> 
> This sounds more complicated than just having the settings virtual
> table return text (dot encoded) -> text (json) and probably not even
> that much more useful. A full table scan on the settings table could
> return all top level keys (strings before the first dot) and if we
> just return a valid json string then users can bring their own
> querying capabilities via jq [1], or one line of code in almost any
> programming language (especially python, perl, etc ...).
> 
> Alternatively if we want to modify the grammar it seems supporting
> structured data querying on text fields would maybe be more preferable
> to LIKE since you could get what you want without a grammar change and
> if we could generalize to any text column it would be amazingly useful
> elsewhere to users. For example, we could emulate jq's query syntax in
> the select which is, imo, best-in-class for quickly querying into
> near

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread bened...@apache.org
I meant to imply we should improve our UDT usability to support this kind of 
querying, essentially – but that if we support a simple text->property setup we 
might want to offer LIKE support so we can search them (via simple filtering, 
not any index) – which is actually pretty easy to provide.

I think we should aim to provide users all the facilities they need to interact 
with config via vtables. If the user requires external tooling, it suggests a 
weakness in CQL that we should address, and maybe help the user in other 
scenario too…

From: Joseph Lynch 
Date: Monday, 29 November 2021 at 17:32
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
 wrote:
>
> Maybe we can make our query language more expressive 
>
> We might anyway want to introduce e.g. a LIKE filtering option to 
> find/discover flattened config parameters?

This sounds more complicated than just having the settings virtual
table return text (dot encoded) -> text (json) and probably not even
that much more useful. A full table scan on the settings table could
return all top level keys (strings before the first dot) and if we
just return a valid json string then users can bring their own
querying capabilities via jq [1], or one line of code in almost any
programming language (especially python, perl, etc ...).

Alternatively if we want to modify the grammar it seems supporting
structured data querying on text fields would maybe be more preferable
to LIKE since you could get what you want without a grammar change and
if we could generalize to any text column it would be amazingly useful
elsewhere to users. For example, we could emulate jq's query syntax in
the select which is, imo, best-in-class for quickly querying into
nearest structures. Assuming a key (text) -> value (json) schema:

'a' -> "{'b': [{'c': {'d': 4}}]}",

SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';

To have exactly jq syntax (but harder to parse) it would be:

SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';

Since we're not indexing the structured data in any way, filtering
before selection probably doesn't give us much performance improvement
as we'd still have to parse the whole text field in most cases.

-Joey

[1] https://stedolan.github.io/jq/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Joseph Lynch
On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
 wrote:
>
> Maybe we can make our query language more expressive 
>
> We might anyway want to introduce e.g. a LIKE filtering option to 
> find/discover flattened config parameters?

This sounds more complicated than just having the settings virtual
table return text (dot encoded) -> text (json) and probably not even
that much more useful. A full table scan on the settings table could
return all top level keys (strings before the first dot) and if we
just return a valid json string then users can bring their own
querying capabilities via jq [1], or one line of code in almost any
programming language (especially python, perl, etc ...).

Alternatively if we want to modify the grammar it seems supporting
structured data querying on text fields would maybe be more preferable
to LIKE since you could get what you want without a grammar change and
if we could generalize to any text column it would be amazingly useful
elsewhere to users. For example, we could emulate jq's query syntax in
the select which is, imo, best-in-class for quickly querying into
nearest structures. Assuming a key (text) -> value (json) schema:

'a' -> "{'b': [{'c': {'d': 4}}]}",

SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';

To have exactly jq syntax (but harder to parse) it would be:

SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';

Since we're not indexing the structured data in any way, filtering
before selection probably doesn't give us much performance improvement
as we'd still have to parse the whole text field in most cases.

-Joey

[1] https://stedolan.github.io/jq/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Benjamin Lerer
>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?


+100

Le lun. 29 nov. 2021 à 17:51, bened...@apache.org  a
écrit :

> Maybe we can make our query language more expressive 
>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
>
> From: Benjamin Lerer 
> Date: Monday, 29 November 2021 at 16:41
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >
> > I don’t think it’s necessarily a requirement that we use the flattened
> > version in vtables. At the very least we can make use of sets, lists,
> etc.
> > But we can probably also use UDTs if this improves clarity.
>
>
> In my opinion part of the issue is on the query side. How do we select a
> nested set or a specific set easily? UDTs are not great for this type of
> queries. For collection we can use CONTAINS and element or range selection
> but insertion might be the problem.
>
> Le lun. 29 nov. 2021 à 17:23, Bowen Song  a écrit :
>
> > In ElasticSearch, the default is a flattened format with almost all
> > lines commented out. See
> >
> >
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
> >
> > I guess they chose to do that because user can uncomment individual
> > lines to make changes. In a structured config file, the user will have
> > to uncomment all lines containing the parent keys to get it work. For
> > example, if someone wants to set the config keyABB to a non-default
> > value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> > keyABB, which can be annoying and could easily maker a mistake. If any
> > of the first two keys is not uncommented, the YAML file will still be
> > valid but the config like keyX.keyAB.keyABB might just get silently
> > ignored by the database.
> >
> > keyX:
> >keyY:
> >  keyZ: value
> > # keyA:
> > #   keyAA:
> > # key AAA: value
> > #   keyAB:
> > # keyABA: value
> > # keyABB: value
> >
> > On 29/11/2021 15:54, Benjamin Lerer wrote:
> > > I do not think that supporting both options is an issue. The settings
> > > virtual table would have to use the flattened version.
> > > If we support both formats, the question would be: what should be the
> one
> > > used by default in the configuration file?
> > >
> > > Le ven. 26 nov. 2021 à 15:40,bened...@apache.org   >
> > a
> > > écrit :
> > >
> > >> This is the approach I favour for config files also. We had a much
> less
> > >> engaged discussion on this topic only a few months ago, so glad to see
> > more
> > >> people getting involved now.
> > >>
> > >> I would however personally prefer to see the configuration file slowly
> > >> deprecated (if perhaps never retired), in favour of virtual tables, so
> > that
> > >> operators may easily set configurations for the entire cluster.
> Ideally
> > it
> > >> would be possible to specify configuration per cluster, per DC and per
> > >> node, with the most specific configuration applying I would like to
> see
> > a
> > >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> > only
> > >> the barest minimum number of options would be necessary to supply in a
> > >> config file, and only on first launch – seed nodes, for instance.
> > >>
> > >> So whatever design we employ here, we should IMO be aiming for it to
> be
> > >> compatible with a CQL representation also.
> > >>
> > >>
> > >> From: Bowen Song
> > >> Date: Wednesday, 24 November 2021 at 18:15
> > >> To:dev@cassandra.apache.org  
> > >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >> Since you mentioned ElasticSearch, I'm actually pretty happy with
> their
> > >> config file syntax. It allows the user to completely flatten out the
> > >> entire config file. To give people who isn't familiar with
> ElasticSearch
> > >> an idea, here is a config file we use:
> > >>
> > >>  cluster.name: foobar
> > >>
> > >>  node.remote_cluster_client: false
> > >>  node.name: "foo.example.com"
> > >>  node.master: true
> > >>  node.data: true
> > >>  node.inges

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread bened...@apache.org
Maybe we can make our query language more expressive 

We might anyway want to introduce e.g. a LIKE filtering option to find/discover 
flattened config parameters?

From: Benjamin Lerer 
Date: Monday, 29 November 2021 at 16:41
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
>
> I don’t think it’s necessarily a requirement that we use the flattened
> version in vtables. At the very least we can make use of sets, lists, etc.
> But we can probably also use UDTs if this improves clarity.


In my opinion part of the issue is on the query side. How do we select a
nested set or a specific set easily? UDTs are not great for this type of
queries. For collection we can use CONTAINS and element or range selection
but insertion might be the problem.

Le lun. 29 nov. 2021 à 17:23, Bowen Song  a écrit :

> In ElasticSearch, the default is a flattened format with almost all
> lines commented out. See
>
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
>
> I guess they chose to do that because user can uncomment individual
> lines to make changes. In a structured config file, the user will have
> to uncomment all lines containing the parent keys to get it work. For
> example, if someone wants to set the config keyABB to a non-default
> value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> keyABB, which can be annoying and could easily maker a mistake. If any
> of the first two keys is not uncommented, the YAML file will still be
> valid but the config like keyX.keyAB.keyABB might just get silently
> ignored by the database.
>
> keyX:
>keyY:
>  keyZ: value
> # keyA:
> #   keyAA:
> # key AAA: value
> #   keyAB:
> # keyABA: value
> # keyABB: value
>
> On 29/11/2021 15:54, Benjamin Lerer wrote:
> > I do not think that supporting both options is an issue. The settings
> > virtual table would have to use the flattened version.
> > If we support both formats, the question would be: what should be the one
> > used by default in the configuration file?
> >
> > Le ven. 26 nov. 2021 à 15:40,bened...@apache.org  
> a
> > écrit :
> >
> >> This is the approach I favour for config files also. We had a much less
> >> engaged discussion on this topic only a few months ago, so glad to see
> more
> >> people getting involved now.
> >>
> >> I would however personally prefer to see the configuration file slowly
> >> deprecated (if perhaps never retired), in favour of virtual tables, so
> that
> >> operators may easily set configurations for the entire cluster. Ideally
> it
> >> would be possible to specify configuration per cluster, per DC and per
> >> node, with the most specific configuration applying I would like to see
> a
> >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> only
> >> the barest minimum number of options would be necessary to supply in a
> >> config file, and only on first launch – seed nodes, for instance.
> >>
> >> So whatever design we employ here, we should IMO be aiming for it to be
> >> compatible with a CQL representation also.
> >>
> >>
> >> From: Bowen Song
> >> Date: Wednesday, 24 November 2021 at 18:15
> >> To:dev@cassandra.apache.org  
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> >> config file syntax. It allows the user to completely flatten out the
> >> entire config file. To give people who isn't familiar with ElasticSearch
> >> an idea, here is a config file we use:
> >>
> >>  cluster.name: foobar
> >>
> >>  node.remote_cluster_client: false
> >>  node.name: "foo.example.com"
> >>  node.master: true
> >>  node.data: true
> >>  node.ingest: true
> >>  node.ml: false
> >>
> >>  xpack.ml.enabled: false
> >>  xpack.security.enabled: false
> >>  xpack.security.audit.enabled: false
> >>  xpack.watcher.enabled: false
> >>
> >>  action.auto_create_index: "+.,-*"
> >>
> >>  network.host: _global_
> >>
> >>  discovery.zen.hosts_provider: file
> >>  discovery.zen.minimum_master_nodes: 2
> >>
> >>  http.publish_host: "foo.example.com"
> >>  http.publish_port: 443
> >>  http.bind_host: 127.0.0.1
> >>
> >>  transport.

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Benjamin Lerer
>
> I don’t think it’s necessarily a requirement that we use the flattened
> version in vtables. At the very least we can make use of sets, lists, etc.
> But we can probably also use UDTs if this improves clarity.


In my opinion part of the issue is on the query side. How do we select a
nested set or a specific set easily? UDTs are not great for this type of
queries. For collection we can use CONTAINS and element or range selection
but insertion might be the problem.

Le lun. 29 nov. 2021 à 17:23, Bowen Song  a écrit :

> In ElasticSearch, the default is a flattened format with almost all
> lines commented out. See
>
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
>
> I guess they chose to do that because user can uncomment individual
> lines to make changes. In a structured config file, the user will have
> to uncomment all lines containing the parent keys to get it work. For
> example, if someone wants to set the config keyABB to a non-default
> value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> keyABB, which can be annoying and could easily maker a mistake. If any
> of the first two keys is not uncommented, the YAML file will still be
> valid but the config like keyX.keyAB.keyABB might just get silently
> ignored by the database.
>
> keyX:
>keyY:
>  keyZ: value
> # keyA:
> #   keyAA:
> # key AAA: value
> #   keyAB:
> # keyABA: value
> # keyABB: value
>
> On 29/11/2021 15:54, Benjamin Lerer wrote:
> > I do not think that supporting both options is an issue. The settings
> > virtual table would have to use the flattened version.
> > If we support both formats, the question would be: what should be the one
> > used by default in the configuration file?
> >
> > Le ven. 26 nov. 2021 à 15:40,bened...@apache.org  
> a
> > écrit :
> >
> >> This is the approach I favour for config files also. We had a much less
> >> engaged discussion on this topic only a few months ago, so glad to see
> more
> >> people getting involved now.
> >>
> >> I would however personally prefer to see the configuration file slowly
> >> deprecated (if perhaps never retired), in favour of virtual tables, so
> that
> >> operators may easily set configurations for the entire cluster. Ideally
> it
> >> would be possible to specify configuration per cluster, per DC and per
> >> node, with the most specific configuration applying I would like to see
> a
> >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> only
> >> the barest minimum number of options would be necessary to supply in a
> >> config file, and only on first launch – seed nodes, for instance.
> >>
> >> So whatever design we employ here, we should IMO be aiming for it to be
> >> compatible with a CQL representation also.
> >>
> >>
> >> From: Bowen Song
> >> Date: Wednesday, 24 November 2021 at 18:15
> >> To:dev@cassandra.apache.org  
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> >> config file syntax. It allows the user to completely flatten out the
> >> entire config file. To give people who isn't familiar with ElasticSearch
> >> an idea, here is a config file we use:
> >>
> >>  cluster.name: foobar
> >>
> >>  node.remote_cluster_client: false
> >>  node.name: "foo.example.com"
> >>  node.master: true
> >>  node.data: true
> >>  node.ingest: true
> >>  node.ml: false
> >>
> >>  xpack.ml.enabled: false
> >>  xpack.security.enabled: false
> >>  xpack.security.audit.enabled: false
> >>  xpack.watcher.enabled: false
> >>
> >>  action.auto_create_index: "+.,-*"
> >>
> >>  network.host: _global_
> >>
> >>  discovery.zen.hosts_provider: file
> >>  discovery.zen.minimum_master_nodes: 2
> >>
> >>  http.publish_host: "foo.example.com"
> >>  http.publish_port: 443
> >>  http.bind_host: 127.0.0.1
> >>
> >>  transport.publish_host: "bar.example.com"
> >>  transport.bind_host: 0.0.0.0
> >>
> >>  indices.fielddata.cache.size: 1GB
> >>  indices.breaker.total.use_real_memory: false
> >>
> >>  path.logs: /var/log/elasticsearch
> >>

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Bowen Song
In ElasticSearch, the default is a flattened format with almost all 
lines commented out. See 
https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml


I guess they chose to do that because user can uncomment individual 
lines to make changes. In a structured config file, the user will have 
to uncomment all lines containing the parent keys to get it work. For 
example, if someone wants to set the config keyABB to a non-default 
value, they will have to correctly uncomment 3 lines: keyA, keyAB and 
keyABB, which can be annoying and could easily maker a mistake. If any 
of the first two keys is not uncommented, the YAML file will still be 
valid but the config like keyX.keyAB.keyABB might just get silently 
ignored by the database.


   keyX:
  keyY:
    keyZ: value
   # keyA:
   #   keyAA:
   # key AAA: value
   #   keyAB:
   # keyABA: value
   # keyABB: value

On 29/11/2021 15:54, Benjamin Lerer wrote:

I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?

Le ven. 26 nov. 2021 à 15:40,bened...@apache.orga
écrit :


This is the approach I favour for config files also. We had a much less
engaged discussion on this topic only a few months ago, so glad to see more
people getting involved now.

I would however personally prefer to see the configuration file slowly
deprecated (if perhaps never retired), in favour of virtual tables, so that
operators may easily set configurations for the entire cluster. Ideally it
would be possible to specify configuration per cluster, per DC and per
node, with the most specific configuration applying I would like to see a
similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
the barest minimum number of options would be necessary to supply in a
config file, and only on first launch – seed nodes, for instance.

So whatever design we employ here, we should IMO be aiming for it to be
compatible with a CQL representation also.


From: Bowen Song
Date: Wednesday, 24 November 2021 at 18:15
To:dev@cassandra.apache.org  
Subject: Re: [DISCUSS] Nested YAML configs for new features
Since you mentioned ElasticSearch, I'm actually pretty happy with their
config file syntax. It allows the user to completely flatten out the
entire config file. To give people who isn't familiar with ElasticSearch
an idea, here is a config file we use:

 cluster.name: foobar

 node.remote_cluster_client: false
 node.name: "foo.example.com"
 node.master: true
 node.data: true
 node.ingest: true
 node.ml: false

 xpack.ml.enabled: false
 xpack.security.enabled: false
 xpack.security.audit.enabled: false
 xpack.watcher.enabled: false

 action.auto_create_index: "+.,-*"

 network.host: _global_

 discovery.zen.hosts_provider: file
 discovery.zen.minimum_master_nodes: 2

 http.publish_host: "foo.example.com"
 http.publish_port: 443
 http.bind_host: 127.0.0.1

 transport.publish_host: "bar.example.com"
 transport.bind_host: 0.0.0.0

 indices.fielddata.cache.size: 1GB
 indices.breaker.total.use_real_memory: false

 path.logs: /var/log/elasticsearch
 path.data: /var/lib/elasticsearch/data

As you can see we can use the flat (grep-able) syntax for everything.
This is also human readable because we can group options together by
inserting empty lines between them.

The equivalent of the above in a structured syntax will be:

 cluster:
  name: foobar

 node:
  remote_cluster_client: false
  name: "foo.example.com"
  master: true
  data: true
  ingest: true
  ml: false

 xpack:
  ml:
  enabled: false
  security:
  enabled: false
  audit:
  enabled: false
  watcher:
  enabled: false

 action:
  auto_create_index: "+.,-*"

 network:
  host: _global_

 discovery:
  zen:
  hosts_provider: file
  minimum_master_nodes: 2

 http:
  publish_host: "foo.example.com"
  publish_port: 443
  bind_host: 127.0.0.1

 transport:
  publish_host: "bar.example.com"
  bind_host: 0.0.0.0

 indices:
  fielddata:
  cache:
  size: 1GB
 indices:
  breaker:
  total:
  use_real_memory: false

 path:
  logs: /var/log/elasticsearch
  data: /var/lib/elasticsearch/data

This may be easier to read for some people, but it is a total nightmare
for "grep" - so many keys have identical names, such as "enabled".

Also, for the virtual tables

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread bened...@apache.org
I don’t think it’s necessarily a requirement that we use the flattened version 
in vtables. At the very least we can make use of sets, lists, etc. But we can 
probably also use UDTs if this improves clarity.

From: Benjamin Lerer 
Date: Monday, 29 November 2021 at 15:54
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Nested YAML configs for new features
I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?

Le ven. 26 nov. 2021 à 15:40, bened...@apache.org  a
écrit :

> This is the approach I favour for config files also. We had a much less
> engaged discussion on this topic only a few months ago, so glad to see more
> people getting involved now.
>
> I would however personally prefer to see the configuration file slowly
> deprecated (if perhaps never retired), in favour of virtual tables, so that
> operators may easily set configurations for the entire cluster. Ideally it
> would be possible to specify configuration per cluster, per DC and per
> node, with the most specific configuration applying I would like to see a
> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
> the barest minimum number of options would be necessary to supply in a
> config file, and only on first launch – seed nodes, for instance.
>
> So whatever design we employ here, we should IMO be aiming for it to be
> compatible with a CQL representation also.
>
>
> From: Bowen Song 
> Date: Wednesday, 24 November 2021 at 18:15
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> config file syntax. It allows the user to completely flatten out the
> entire config file. To give people who isn't familiar with ElasticSearch
> an idea, here is a config file we use:
>
> cluster.name: foobar
>
> node.remote_cluster_client: false
> node.name: "foo.example.com"
> node.master: true
> node.data: true
> node.ingest: true
> node.ml: false
>
> xpack.ml.enabled: false
> xpack.security.enabled: false
> xpack.security.audit.enabled: false
> xpack.watcher.enabled: false
>
> action.auto_create_index: "+.,-*"
>
> network.host: _global_
>
> discovery.zen.hosts_provider: file
> discovery.zen.minimum_master_nodes: 2
>
> http.publish_host: "foo.example.com"
> http.publish_port: 443
> http.bind_host: 127.0.0.1
>
> transport.publish_host: "bar.example.com"
> transport.bind_host: 0.0.0.0
>
> indices.fielddata.cache.size: 1GB
> indices.breaker.total.use_real_memory: false
>
> path.logs: /var/log/elasticsearch
> path.data: /var/lib/elasticsearch/data
>
> As you can see we can use the flat (grep-able) syntax for everything.
> This is also human readable because we can group options together by
> inserting empty lines between them.
>
> The equivalent of the above in a structured syntax will be:
>
> cluster:
>  name: foobar
>
> node:
>  remote_cluster_client: false
>  name: "foo.example.com"
>  master: true
>  data: true
>  ingest: true
>  ml: false
>
> xpack:
>  ml:
>  enabled: false
>  security:
>  enabled: false
>  audit:
>  enabled: false
>  watcher:
>  enabled: false
>
> action:
>  auto_create_index: "+.,-*"
>
> network:
>  host: _global_
>
> discovery:
>  zen:
>  hosts_provider: file
>  minimum_master_nodes: 2
>
> http:
>  publish_host: "foo.example.com"
>  publish_port: 443
>  bind_host: 127.0.0.1
>
> transport:
>  publish_host: "bar.example.com"
>  bind_host: 0.0.0.0
>
> indices:
>  fielddata:
>  cache:
>  size: 1GB
> indices:
>  breaker:
>  total:
>  use_real_memory: false
>
> path:
>  logs: /var/log/elasticsearch
>  data: /var/lib/elasticsearch/data
>
> This may be easier to read for some people, but it is a total nightmare
> for "grep" - so many keys have identical names, such as "enabled".
>
> Also, for the virtual tables, it would be a lot easier to represent
> individual values in a virtual table when the confi

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Benjamin Lerer
I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?

Le ven. 26 nov. 2021 à 15:40, bened...@apache.org  a
écrit :

> This is the approach I favour for config files also. We had a much less
> engaged discussion on this topic only a few months ago, so glad to see more
> people getting involved now.
>
> I would however personally prefer to see the configuration file slowly
> deprecated (if perhaps never retired), in favour of virtual tables, so that
> operators may easily set configurations for the entire cluster. Ideally it
> would be possible to specify configuration per cluster, per DC and per
> node, with the most specific configuration applying I would like to see a
> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
> the barest minimum number of options would be necessary to supply in a
> config file, and only on first launch – seed nodes, for instance.
>
> So whatever design we employ here, we should IMO be aiming for it to be
> compatible with a CQL representation also.
>
>
> From: Bowen Song 
> Date: Wednesday, 24 November 2021 at 18:15
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> config file syntax. It allows the user to completely flatten out the
> entire config file. To give people who isn't familiar with ElasticSearch
> an idea, here is a config file we use:
>
> cluster.name: foobar
>
> node.remote_cluster_client: false
> node.name: "foo.example.com"
> node.master: true
> node.data: true
> node.ingest: true
> node.ml: false
>
> xpack.ml.enabled: false
> xpack.security.enabled: false
> xpack.security.audit.enabled: false
> xpack.watcher.enabled: false
>
> action.auto_create_index: "+.,-*"
>
> network.host: _global_
>
> discovery.zen.hosts_provider: file
> discovery.zen.minimum_master_nodes: 2
>
> http.publish_host: "foo.example.com"
> http.publish_port: 443
> http.bind_host: 127.0.0.1
>
> transport.publish_host: "bar.example.com"
> transport.bind_host: 0.0.0.0
>
> indices.fielddata.cache.size: 1GB
> indices.breaker.total.use_real_memory: false
>
> path.logs: /var/log/elasticsearch
> path.data: /var/lib/elasticsearch/data
>
> As you can see we can use the flat (grep-able) syntax for everything.
> This is also human readable because we can group options together by
> inserting empty lines between them.
>
> The equivalent of the above in a structured syntax will be:
>
> cluster:
>  name: foobar
>
> node:
>  remote_cluster_client: false
>  name: "foo.example.com"
>  master: true
>  data: true
>  ingest: true
>  ml: false
>
> xpack:
>  ml:
>  enabled: false
>  security:
>  enabled: false
>  audit:
>  enabled: false
>  watcher:
>  enabled: false
>
> action:
>  auto_create_index: "+.,-*"
>
> network:
>  host: _global_
>
> discovery:
>  zen:
>  hosts_provider: file
>  minimum_master_nodes: 2
>
> http:
>  publish_host: "foo.example.com"
>  publish_port: 443
>  bind_host: 127.0.0.1
>
> transport:
>  publish_host: "bar.example.com"
>  bind_host: 0.0.0.0
>
> indices:
>  fielddata:
>  cache:
>  size: 1GB
> indices:
>  breaker:
>  total:
>  use_real_memory: false
>
> path:
>  logs: /var/log/elasticsearch
>  data: /var/lib/elasticsearch/data
>
> This may be easier to read for some people, but it is a total nightmare
> for "grep" - so many keys have identical names, such as "enabled".
>
> Also, for the virtual tables, it would be a lot easier to represent
> individual values in a virtual table when the config is flat and keys
> are unique. The virtual tables would need to either support the encoding
> and decoding of the structured config into a flat structure, or use JSON
> encoded string value. The use of JSON would make querying individual
> value much harder.
>
> On 22/11/2021 16:16, Joseph Lynch wrote:
> > Isn't one of the prima

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Bowen Song
Since you mentioned ElasticSearch, I'm actually pretty happy with their 
config file syntax. It allows the user to completely flatten out the 
entire config file. To give people who isn't familiar with ElasticSearch 
an idea, here is a config file we use:


   cluster.name: foobar

   node.remote_cluster_client: false
   node.name: "foo.example.com"
   node.master: true
   node.data: true
   node.ingest: true
   node.ml: false

   xpack.ml.enabled: false
   xpack.security.enabled: false
   xpack.security.audit.enabled: false
   xpack.watcher.enabled: false

   action.auto_create_index: "+.,-*"

   network.host: _global_

   discovery.zen.hosts_provider: file
   discovery.zen.minimum_master_nodes: 2

   http.publish_host: "foo.example.com"
   http.publish_port: 443
   http.bind_host: 127.0.0.1

   transport.publish_host: "bar.example.com"
   transport.bind_host: 0.0.0.0

   indices.fielddata.cache.size: 1GB
   indices.breaker.total.use_real_memory: false

   path.logs: /var/log/elasticsearch
   path.data: /var/lib/elasticsearch/data

As you can see we can use the flat (grep-able) syntax for everything. 
This is also human readable because we can group options together by 
inserting empty lines between them.


The equivalent of the above in a structured syntax will be:

   cluster:
    name: foobar

   node:
    remote_cluster_client: false
    name: "foo.example.com"
    master: true
    data: true
    ingest: true
    ml: false

   xpack:
    ml:
    enabled: false
    security:
    enabled: false
    audit:
    enabled: false
    watcher:
    enabled: false

   action:
    auto_create_index: "+.,-*"

   network:
    host: _global_

   discovery:
    zen:
    hosts_provider: file
    minimum_master_nodes: 2

   http:
    publish_host: "foo.example.com"
    publish_port: 443
    bind_host: 127.0.0.1

   transport:
    publish_host: "bar.example.com"
    bind_host: 0.0.0.0

   indices:
    fielddata:
    cache:
    size: 1GB
   indices:
    breaker:
    total:
    use_real_memory: false

   path:
    logs: /var/log/elasticsearch
    data: /var/lib/elasticsearch/data

This may be easier to read for some people, but it is a total nightmare 
for "grep" - so many keys have identical names, such as "enabled".


Also, for the virtual tables, it would be a lot easier to represent 
individual values in a virtual table when the config is flat and keys 
are unique. The virtual tables would need to either support the encoding 
and decoding of the structured config into a flat structure, or use JSON 
encoded string value. The use of JSON would make querying individual 
value much harder.


On 22/11/2021 16:16, Joseph Lynch wrote:

Isn't one of the primary reasons to have a YAML configuration instead
of a properties file is to allow typed and structured (implies nested)
configuration? I think it makes a lot of sense to group related
configuration options (e.g. a feature) into a typed class when we're
talking about more than one or two related options.

It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
period encoded key->value pairs when required (usually when providing
a property or override layer), Spring and Elasticsearch yamls both
come to mind. It seems pretty reasonable to support dot encoding and
decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.

Regarding quickly telling what configuration a node is running I think
we should lean on virtual tables for "what is the current
configuration" now that we have them, as others have said the written
cassandra.yaml is not necessarily the current configuration ... and
also grep -C or -A exist for this reason.

-Joey

On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer  wrote:

I do not have a strong opinion for one or the other but wanted to raise the
issue I see with the "Settings" virtual table.

Currently the "Settings" virtual table converts nested options into flat
options using a "_" separator. For those options it allows a user to query
the all set of options through some hack.
If we decide to move to more nesting (more than one level), it seems to me
that we need to change the way this table is behaving and how we can query
its data.

We would need to start using "." as a nesting separator to ensure that
things are consistent between the configuration and the table and add
support for LIKE restrictions for filtering queries to allow operators to
be able to select the precise set of settings that the operator is looking
for.

Doing so is not really complicated in itself but might impact some users.

Le ven. 19 nov. 2021 à 22:39, David Capwell  a
écrit :


it is really handy to grep
cassandra.yaml on some config key and you know the value instantly.

You can still do that

$ grep -A2 coordinator_read_size conf/cassandra.yaml
# coordinator_read_size:
#

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Jacek Lewandowski
We still have yq, mentioned a couple of posts earlier which does even more
than grep, so i suppose it could satisfy both camps :)

- - -- --- -  -
Jacek Lewandowski


On Wed, Nov 24, 2021 at 6:13 PM Joseph Lynch  wrote:

> On Wed, Nov 24, 2021 at 9:00 AM Bowen Song  wrote:
> > Structured / nested config is easier for human eyes to read but very
> > hard for simple scripts to handle. Flat config is harder for human eyes
> > but easy for simple scripts. I can see user may prefer one over another
> > depending on their own use case. If the structured / nested config must
> > be introduced, I would like to see both syntaxes supported to allow the
> > user to make their own choice.
>
> To be clear, structured configuration was already adopted by Cassandra
> a long time ago and is already used successfully in the status quo
> (for example server/client encryption options, all of the pluggable
> class configurations). I believe the question was "when we are adding
> a number of related options should we structure them?". I think the
> answer is clearly yes because it makes the configuration code in the
> database a lot cleaner and allows us to leverage strongly typed
> configuration. Related configuration should continue to be grouped as
> if you were using a prefix of a dot encoded property (so {"a": {"b":
> 4}} is equivalent to "a.b: 4").
>
> There is the separate question of "how can an operator tell what
> configuration a node is running with" and for obvious reasons grepping
> cassandra.yaml is not a good public interface, we can do better via
> either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
> that preserves the structured configuration rather than trying to
> flatten it.
>
> -Joey
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Joseph Lynch
On Wed, Nov 24, 2021 at 9:00 AM Bowen Song  wrote:
> Structured / nested config is easier for human eyes to read but very
> hard for simple scripts to handle. Flat config is harder for human eyes
> but easy for simple scripts. I can see user may prefer one over another
> depending on their own use case. If the structured / nested config must
> be introduced, I would like to see both syntaxes supported to allow the
> user to make their own choice.

To be clear, structured configuration was already adopted by Cassandra
a long time ago and is already used successfully in the status quo
(for example server/client encryption options, all of the pluggable
class configurations). I believe the question was "when we are adding
a number of related options should we structure them?". I think the
answer is clearly yes because it makes the configuration code in the
database a lot cleaner and allows us to leverage strongly typed
configuration. Related configuration should continue to be grouped as
if you were using a prefix of a dot encoded property (so {"a": {"b":
4}} is equivalent to "a.b: 4").

There is the separate question of "how can an operator tell what
configuration a node is running with" and for obvious reasons grepping
cassandra.yaml is not a good public interface, we can do better via
either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
that preserves the structured configuration rather than trying to
flatten it.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Bowen Song
It only works if the output is for human to read. If you have a large 
number of servers, very often you want to do "grep -q ... && 
other_command" (or || other_command), or chaining the grep results frin 
parallel-ssh into another command (grep or sort). The -A/-B/-C switches 
will not work in this case. If the nested configurations have multiple 
keys with the same name (e.g.: a dictionary where the values are very 
similar dictionaries), even chaining 3 grep commands in the form of 
"grep -A ... | grep -B ... | grep -q ... " is unlikely to work.


Structured / nested config is easier for human eyes to read but very 
hard for simple scripts to handle. Flat config is harder for human eyes 
but easy for simple scripts. I can see user may prefer one over another 
depending on their own use case. If the structured / nested config must 
be introduced, I would like to see both syntaxes supported to allow the 
user to make their own choice.



On 24/11/2021 16:21, Henrik Ingo wrote:

Grepping is an important use case, and having worked with another database
that does nest its configs, I can offer some tips how I survived:

With good old grep, it can help to use the before and after options:

grep -A 5 track_warnings | grep -B 5 warn_threshold

Would find you this:

track_warnings:
 enabled: true
 coordinator_read_size:
 warn_threshold: 10kb

It would require magic expert knowledge to guess right numbers for -A and
-B but in many cases you could just use a large number like  and it
will work in most cases.

For more frequent use, you will want to just install `yq` (aka yaml query):
https://github.com/kislyuk/yq

henrik


On Fri, Nov 19, 2021 at 9:07 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:


Hi David,

while I do not oppose nested structure, it is really handy to grep
cassandra.yaml on some config key and you know the value instantly.
This is not possible when it is nested (easily & fastly) as it is on
two lines. Or maybe my grepping is just not advanced enough to cover
this case? If it is flat, I can just grep "track_warnings" and I have
them all.

Can you elaborate on your last bullet point? Parsing layer ... What do
you mean specifically?

Thanks

On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:

This has been brought up in a few tickets, so pushing to the dev list.

CASSANDRA-15234 - Standardise config and JVM parameters
CASSANDRA-16896 - hard/soft limits for queries
CASSANDRA-17147 - Guardrails prototype

In short, do we as a project wish to move "new features" into nested
YAML when the feature has "enough" to justify the nesting?  I would
really like to focus this discussion on new features rather than
retroactively grouping (leaving that to CASSANDRA-15234), as there is
already a place to talk about that.

To get things started, let's start with the track-warning feature
(hard/soft limits for queries), currently the configs look as follows
(assuming 15234)

track_warnings:
 enabled: true
 coordinator_read_size:
 warn_threshold: 10kb
 abort_threshold: 1mb
 local_read_size:
 warn_threshold: 10kb
 abort_threshold: 1mb
 row_index_size:
 warn_threshold: 100mb
 abort_threshold: 1gb

or should this be "flat"

track_warnings_enabled: true
track_warnings_coordinator_read_size_warn_threshold: 10kb
track_warnings_coordinator_read_size_abort_threshold: 1mb
track_warnings_local_read_size_warn_threshold: 10kb
track_warnings_local_read_size_abort_threshold: 1mb
track_warnings_row_index_size_warn_threshold: 100mb
track_warnings_row_index_size_abort_threshold: 1gb

For me I prefer nested for a few reasons
* easier to enforce consistency as the configs can use shared types;
in the track warnings patch I had mismatches cross configs (warn vs
warns, fail vs abort, etc.) before going nested, now everything reuses
the same types
* even though it is longer, things can be more clear how they are related
* parsing layer can add support for mixed or purely flat depending on
user preference (example:
track_warnings.row_index_size.abort_threshold, using the '.' notation
to represent nested structures)

Thoughts?

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Henrik Ingo
Grepping is an important use case, and having worked with another database
that does nest its configs, I can offer some tips how I survived:

With good old grep, it can help to use the before and after options:

grep -A 5 track_warnings | grep -B 5 warn_threshold

Would find you this:

track_warnings:
enabled: true
coordinator_read_size:
warn_threshold: 10kb

It would require magic expert knowledge to guess right numbers for -A and
-B but in many cases you could just use a large number like  and it
will work in most cases.

For more frequent use, you will want to just install `yq` (aka yaml query):
https://github.com/kislyuk/yq

henrik


On Fri, Nov 19, 2021 at 9:07 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> > enabled: true
> > coordinator_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > local_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > row_index_size:
> > warn_threshold: 100mb
> > abort_threshold: 1gb
> >
> > or should this be "flat"
> >
> > track_warnings_enabled: true
> > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > track_warnings_local_read_size_warn_threshold: 10kb
> > track_warnings_local_read_size_abort_threshold: 1mb
> > track_warnings_row_index_size_warn_threshold: 100mb
> > track_warnings_row_index_size_abort_threshold: 1gb
> >
> > For me I prefer nested for a few reasons
> > * easier to enforce consistency as the configs can use shared types;
> > in the track warnings patch I had mismatches cross configs (warn vs
> > warns, fail vs abort, etc.) before going nested, now everything reuses
> > the same types
> > * even though it is longer, things can be more clear how they are related
> > * parsing layer can add support for mixed or purely flat depending on
> > user preference (example:
> > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > to represent nested structures)
> >
> > Thoughts?
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Joseph Lynch
On Wed, Nov 24, 2021 at 5:55 AM Jacek Lewandowski
 wrote:
>
> I am just wondering how to represent in properties things like lists of
> non-scalar values?
>

In my experience properties are not sufficient for complex
configuration sorta for this reason, that's why using structured YAML
(or any structured configuration language) is so much more powerful
than a properties file. I think if we leaned into structured
configuration we'd have mostly maps of maps pointing to scalars which
are well addressed by dot encoding.

Dot encoding only works down to the first non scalar/object leaf node
and then the value needs to be structured. So a list of maps for
example would be in the value, for example in {"a": {"b": 4, "c":
[{"d": 3}, {"d": 2}]}} you'd be able to query for 'a.b' -> 4 or
'a.b.c' -> [{"d": 3}, {"d": 2}]. Single scalar values are valid JSON
so if we have to have a text -> text encoding I'd go for the key is
the dot encoded key and the value is the JSON encoded value, that's
maybe the easiest way to generically represent complex structured
configuration in a flat key->value mapping.

I think Elasticsearch's live reconfiguration API [1] which accepts dot
encoded JSON and merges with on disk YAML and Puppet's Hiera
configuration language [2] which allows you to index into YAMLs using
dot encoding are some great interfaces for us to study. The latter
even allows the user to query into lists by using a number as the key
(similar to jq[3] except without the square brackets) so you could ask
for 'a.b.c.0' and get back {"d": 3}.

-Joey

[1] 
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html
[2] https://puppet.com/docs/puppet/6/function.html#get
[3] https://stedolan.github.io/jq/manual/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Jacek Lewandowski
I am just wondering how to represent in properties things like lists of
non-scalar values?


- - -- --- -  -
Jacek Lewandowski


On Mon, Nov 22, 2021 at 5:16 PM Joseph Lynch  wrote:

> Isn't one of the primary reasons to have a YAML configuration instead
> of a properties file is to allow typed and structured (implies nested)
> configuration? I think it makes a lot of sense to group related
> configuration options (e.g. a feature) into a typed class when we're
> talking about more than one or two related options.
>
> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> period encoded key->value pairs when required (usually when providing
> a property or override layer), Spring and Elasticsearch yamls both
> come to mind. It seems pretty reasonable to support dot encoding and
> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
>
> Regarding quickly telling what configuration a node is running I think
> we should lean on virtual tables for "what is the current
> configuration" now that we have them, as others have said the written
> cassandra.yaml is not necessarily the current configuration ... and
> also grep -C or -A exist for this reason.
>
> -Joey
>
> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer  wrote:
> >
> > I do not have a strong opinion for one or the other but wanted to raise
> the
> > issue I see with the "Settings" virtual table.
> >
> > Currently the "Settings" virtual table converts nested options into flat
> > options using a "_" separator. For those options it allows a user to
> query
> > the all set of options through some hack.
> > If we decide to move to more nesting (more than one level), it seems to
> me
> > that we need to change the way this table is behaving and how we can
> query
> > its data.
> >
> > We would need to start using "." as a nesting separator to ensure that
> > things are consistent between the configuration and the table and add
> > support for LIKE restrictions for filtering queries to allow operators to
> > be able to select the precise set of settings that the operator is
> looking
> > for.
> >
> > Doing so is not really complicated in itself but might impact some users.
> >
> > Le ven. 19 nov. 2021 à 22:39, David Capwell 
> a
> > écrit :
> >
> > > > it is really handy to grep
> > > > cassandra.yaml on some config key and you know the value instantly.
> > >
> > > You can still do that
> > >
> > > $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > > # coordinator_read_size:
> > > # warn_threshold_kb: 0
> > > # abort_threshold_kb: 0
> > >
> > > I was also arguing we should support nested and flat, so if your infra
> > > works better with flat then you could use
> > >
> > > track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > > track_warnings.coordinator_read_size.abort_threshold_kb: 0
> > >
> > > > On Nov 19, 2021, at 1:34 PM, David Capwell 
> wrote:
> > > >
> > > >> With the flat structure it turns into properties file - would it be
> > > >> possible to support both formats - nested yaml and flat properties?
> > > >
> > > >
> > > > For majority of our configs yes, but there are a subset where flat
> > > properties is annoying
> > > >
> > > > hinted_handoff_disabled_datacenters - set type, so you could do
> > > hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> > > with separators as the format doesn’t support
> > > > seed_provider.parameters - this is a map type… so would need to do
> > > something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> special
> > > case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> have
> > > ParameterizedClass all over the code
> > > >
> > > > So, as long as we define how to deal with java collections; we could
> in
> > > theory support properties files (not arguing for that in this thread)
> as
> > > well as system properties.
> > > >
> > > >
> > > >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > > lewandowski.ja...@gmail.com> wrote:
> > > >>
> > > >> With the flat structure it turns into properties file - would it be
> > > >> possible to support both formats - nested yaml and flat properties?
> > > >>
> > > >>
> > > >> - - -- --- -  -
> > > >> Jacek Lewandowski
> > > >>
> > > >>
> > > >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > > calebrackli...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> If it's nested, "track_warnings" would still work if you're
> grepping
> > > around
> > > >>> vim or less.
> > > >>>
> > > >>> I'd have to concede the point about grep output, although there are
> > > tools
> > > >>> like https://github.com/kislyuk/yq that could probably be bent to
> do
> > > what
> > > >>> you want.
> > > >>>
> > > >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > > >>> stefan.mikloso...@instaclustr.com> wrote:
> > > >>>
> > >  Hi David,
> > > 
> > >  while I do not oppose nested structure, it is really handy to grep
> > >  

Re: [DISCUSS] Nested YAML configs for new features

2021-11-22 Thread Joseph Lynch
Isn't one of the primary reasons to have a YAML configuration instead
of a properties file is to allow typed and structured (implies nested)
configuration? I think it makes a lot of sense to group related
configuration options (e.g. a feature) into a typed class when we're
talking about more than one or two related options.

It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
period encoded key->value pairs when required (usually when providing
a property or override layer), Spring and Elasticsearch yamls both
come to mind. It seems pretty reasonable to support dot encoding and
decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.

Regarding quickly telling what configuration a node is running I think
we should lean on virtual tables for "what is the current
configuration" now that we have them, as others have said the written
cassandra.yaml is not necessarily the current configuration ... and
also grep -C or -A exist for this reason.

-Joey

On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer  wrote:
>
> I do not have a strong opinion for one or the other but wanted to raise the
> issue I see with the "Settings" virtual table.
>
> Currently the "Settings" virtual table converts nested options into flat
> options using a "_" separator. For those options it allows a user to query
> the all set of options through some hack.
> If we decide to move to more nesting (more than one level), it seems to me
> that we need to change the way this table is behaving and how we can query
> its data.
>
> We would need to start using "." as a nesting separator to ensure that
> things are consistent between the configuration and the table and add
> support for LIKE restrictions for filtering queries to allow operators to
> be able to select the precise set of settings that the operator is looking
> for.
>
> Doing so is not really complicated in itself but might impact some users.
>
> Le ven. 19 nov. 2021 à 22:39, David Capwell  a
> écrit :
>
> > > it is really handy to grep
> > > cassandra.yaml on some config key and you know the value instantly.
> >
> > You can still do that
> >
> > $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > # coordinator_read_size:
> > # warn_threshold_kb: 0
> > # abort_threshold_kb: 0
> >
> > I was also arguing we should support nested and flat, so if your infra
> > works better with flat then you could use
> >
> > track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >
> > > On Nov 19, 2021, at 1:34 PM, David Capwell  wrote:
> > >
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >
> > >
> > > For majority of our configs yes, but there are a subset where flat
> > properties is annoying
> > >
> > > hinted_handoff_disabled_datacenters - set type, so you could do
> > hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> > with separators as the format doesn’t support
> > > seed_provider.parameters - this is a map type… so would need to do
> > something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
> > case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
> > ParameterizedClass all over the code
> > >
> > > So, as long as we define how to deal with java collections; we could in
> > theory support properties files (not arguing for that in this thread) as
> > well as system properties.
> > >
> > >
> > >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > lewandowski.ja...@gmail.com> wrote:
> > >>
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >>
> > >>
> > >> - - -- --- -  -
> > >> Jacek Lewandowski
> > >>
> > >>
> > >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > calebrackli...@gmail.com>
> > >> wrote:
> > >>
> > >>> If it's nested, "track_warnings" would still work if you're grepping
> > around
> > >>> vim or less.
> > >>>
> > >>> I'd have to concede the point about grep output, although there are
> > tools
> > >>> like https://github.com/kislyuk/yq that could probably be bent to do
> > what
> > >>> you want.
> > >>>
> > >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > >>> stefan.mikloso...@instaclustr.com> wrote:
> > >>>
> >  Hi David,
> > 
> >  while I do not oppose nested structure, it is really handy to grep
> >  cassandra.yaml on some config key and you know the value instantly.
> >  This is not possible when it is nested (easily & fastly) as it is on
> >  two lines. Or maybe my grepping is just not advanced enough to cover
> >  this case? If it is flat, I can just grep "track_warnings" and I have
> >  them all.
> > 
> >  Can you elaborate on your last bullet point? Parsing layer ... What do
> >  you mean specifically?
> > 
> 

Re: [DISCUSS] Nested YAML configs for new features

2021-11-22 Thread Benjamin Lerer
I do not have a strong opinion for one or the other but wanted to raise the
issue I see with the "Settings" virtual table.

Currently the "Settings" virtual table converts nested options into flat
options using a "_" separator. For those options it allows a user to query
the all set of options through some hack.
If we decide to move to more nesting (more than one level), it seems to me
that we need to change the way this table is behaving and how we can query
its data.

We would need to start using "." as a nesting separator to ensure that
things are consistent between the configuration and the table and add
support for LIKE restrictions for filtering queries to allow operators to
be able to select the precise set of settings that the operator is looking
for.

Doing so is not really complicated in itself but might impact some users.

Le ven. 19 nov. 2021 à 22:39, David Capwell  a
écrit :

> > it is really handy to grep
> > cassandra.yaml on some config key and you know the value instantly.
>
> You can still do that
>
> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> # coordinator_read_size:
> # warn_threshold_kb: 0
> # abort_threshold_kb: 0
>
> I was also arguing we should support nested and flat, so if your infra
> works better with flat then you could use
>
> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> track_warnings.coordinator_read_size.abort_threshold_kb: 0
>
> > On Nov 19, 2021, at 1:34 PM, David Capwell  wrote:
> >
> >> With the flat structure it turns into properties file - would it be
> >> possible to support both formats - nested yaml and flat properties?
> >
> >
> > For majority of our configs yes, but there are a subset where flat
> properties is annoying
> >
> > hinted_handoff_disabled_datacenters - set type, so you could do
> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> with separators as the format doesn’t support
> > seed_provider.parameters - this is a map type… so would need to do
> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
> ParameterizedClass all over the code
> >
> > So, as long as we define how to deal with java collections; we could in
> theory support properties files (not arguing for that in this thread) as
> well as system properties.
> >
> >
> >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
> >>
> >> With the flat structure it turns into properties file - would it be
> >> possible to support both formats - nested yaml and flat properties?
> >>
> >>
> >> - - -- --- -  -
> >> Jacek Lewandowski
> >>
> >>
> >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> calebrackli...@gmail.com>
> >> wrote:
> >>
> >>> If it's nested, "track_warnings" would still work if you're grepping
> around
> >>> vim or less.
> >>>
> >>> I'd have to concede the point about grep output, although there are
> tools
> >>> like https://github.com/kislyuk/yq that could probably be bent to do
> what
> >>> you want.
> >>>
> >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>> stefan.mikloso...@instaclustr.com> wrote:
> >>>
>  Hi David,
> 
>  while I do not oppose nested structure, it is really handy to grep
>  cassandra.yaml on some config key and you know the value instantly.
>  This is not possible when it is nested (easily & fastly) as it is on
>  two lines. Or maybe my grepping is just not advanced enough to cover
>  this case? If it is flat, I can just grep "track_warnings" and I have
>  them all.
> 
>  Can you elaborate on your last bullet point? Parsing layer ... What do
>  you mean specifically?
> 
>  Thanks
> 
>  On Fri, 19 Nov 2021 at 19:36, David Capwell 
> wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev
> list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> >   enabled: true
> >   coordinator_read_size:
> >   warn_threshold: 10kb
> >   abort_threshold: 1mb
> >   local_read_size:
> >   warn_threshold: 10kb
> >   abort_threshold: 1mb
> >   row_index_size:
> >   warn_threshold: 100mb
> >   

Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread David Capwell
> it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.

You can still do that

$ grep -A2 coordinator_read_size conf/cassandra.yaml
# coordinator_read_size:
# warn_threshold_kb: 0
# abort_threshold_kb: 0

I was also arguing we should support nested and flat, so if your infra works 
better with flat then you could use

track_warnings.coordinator_read_size.warn_threshold_kb: 0
track_warnings.coordinator_read_size.abort_threshold_kb: 0

> On Nov 19, 2021, at 1:34 PM, David Capwell  wrote:
> 
>> With the flat structure it turns into properties file - would it be
>> possible to support both formats - nested yaml and flat properties?
> 
> 
> For majority of our configs yes, but there are a subset where flat properties 
> is annoying
> 
> hinted_handoff_disabled_datacenters - set type, so you could do 
> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal with 
> separators as the format doesn’t support
> seed_provider.parameters - this is a map type… so would need to do something 
> like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special case maps 
> as dynamic fields?  Then seed_provider.parameters.a=a?  We have 
> ParameterizedClass all over the code
> 
> So, as long as we define how to deal with java collections; we could in 
> theory support properties files (not arguing for that in this thread) as well 
> as system properties.
> 
> 
>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski  
>> wrote:
>> 
>> With the flat structure it turns into properties file - would it be
>> possible to support both formats - nested yaml and flat properties?
>> 
>> 
>> - - -- --- -  -
>> Jacek Lewandowski
>> 
>> 
>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe 
>> wrote:
>> 
>>> If it's nested, "track_warnings" would still work if you're grepping around
>>> vim or less.
>>> 
>>> I'd have to concede the point about grep output, although there are tools
>>> like https://github.com/kislyuk/yq that could probably be bent to do what
>>> you want.
>>> 
>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>>> stefan.mikloso...@instaclustr.com> wrote:
>>> 
 Hi David,
 
 while I do not oppose nested structure, it is really handy to grep
 cassandra.yaml on some config key and you know the value instantly.
 This is not possible when it is nested (easily & fastly) as it is on
 two lines. Or maybe my grepping is just not advanced enough to cover
 this case? If it is flat, I can just grep "track_warnings" and I have
 them all.
 
 Can you elaborate on your last bullet point? Parsing layer ... What do
 you mean specifically?
 
 Thanks
 
 On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
> 
> This has been brought up in a few tickets, so pushing to the dev list.
> 
> CASSANDRA-15234 - Standardise config and JVM parameters
> CASSANDRA-16896 - hard/soft limits for queries
> CASSANDRA-17147 - Guardrails prototype
> 
> In short, do we as a project wish to move "new features" into nested
> YAML when the feature has "enough" to justify the nesting?  I would
> really like to focus this discussion on new features rather than
> retroactively grouping (leaving that to CASSANDRA-15234), as there is
> already a place to talk about that.
> 
> To get things started, let's start with the track-warning feature
> (hard/soft limits for queries), currently the configs look as follows
> (assuming 15234)
> 
> track_warnings:
>   enabled: true
>   coordinator_read_size:
>   warn_threshold: 10kb
>   abort_threshold: 1mb
>   local_read_size:
>   warn_threshold: 10kb
>   abort_threshold: 1mb
>   row_index_size:
>   warn_threshold: 100mb
>   abort_threshold: 1gb
> 
> or should this be "flat"
> 
> track_warnings_enabled: true
> track_warnings_coordinator_read_size_warn_threshold: 10kb
> track_warnings_coordinator_read_size_abort_threshold: 1mb
> track_warnings_local_read_size_warn_threshold: 10kb
> track_warnings_local_read_size_abort_threshold: 1mb
> track_warnings_row_index_size_warn_threshold: 100mb
> track_warnings_row_index_size_abort_threshold: 1gb
> 
> For me I prefer nested for a few reasons
> * easier to enforce consistency as the configs can use shared types;
> in the track warnings patch I had mismatches cross configs (warn vs
> warns, fail vs abort, etc.) before going nested, now everything reuses
> the same types
> * even though it is longer, things can be more clear how they are
>>> related
> * parsing layer can add support for mixed or purely flat depending on
> user preference (example:
> track_warnings.row_index_size.abort_threshold, using the '.' notation
> to represent nested structures)
> 
> Thoughts?
> 
> 

Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread David Capwell
> With the flat structure it turns into properties file - would it be
> possible to support both formats - nested yaml and flat properties?


For majority of our configs yes, but there are a subset where flat properties 
is annoying

hinted_handoff_disabled_datacenters - set type, so you could do 
hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal with 
separators as the format doesn’t support
seed_provider.parameters - this is a map type… so would need to do something 
like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special case maps as 
dynamic fields?  Then seed_provider.parameters.a=a?  We have ParameterizedClass 
all over the code

So, as long as we define how to deal with java collections; we could in theory 
support properties files (not arguing for that in this thread) as well as 
system properties.


> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski  
> wrote:
> 
> With the flat structure it turns into properties file - would it be
> possible to support both formats - nested yaml and flat properties?
> 
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe 
> wrote:
> 
>> If it's nested, "track_warnings" would still work if you're grepping around
>> vim or less.
>> 
>> I'd have to concede the point about grep output, although there are tools
>> like https://github.com/kislyuk/yq that could probably be bent to do what
>> you want.
>> 
>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>> stefan.mikloso...@instaclustr.com> wrote:
>> 
>>> Hi David,
>>> 
>>> while I do not oppose nested structure, it is really handy to grep
>>> cassandra.yaml on some config key and you know the value instantly.
>>> This is not possible when it is nested (easily & fastly) as it is on
>>> two lines. Or maybe my grepping is just not advanced enough to cover
>>> this case? If it is flat, I can just grep "track_warnings" and I have
>>> them all.
>>> 
>>> Can you elaborate on your last bullet point? Parsing layer ... What do
>>> you mean specifically?
>>> 
>>> Thanks
>>> 
>>> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
 
 This has been brought up in a few tickets, so pushing to the dev list.
 
 CASSANDRA-15234 - Standardise config and JVM parameters
 CASSANDRA-16896 - hard/soft limits for queries
 CASSANDRA-17147 - Guardrails prototype
 
 In short, do we as a project wish to move "new features" into nested
 YAML when the feature has "enough" to justify the nesting?  I would
 really like to focus this discussion on new features rather than
 retroactively grouping (leaving that to CASSANDRA-15234), as there is
 already a place to talk about that.
 
 To get things started, let's start with the track-warning feature
 (hard/soft limits for queries), currently the configs look as follows
 (assuming 15234)
 
 track_warnings:
enabled: true
coordinator_read_size:
warn_threshold: 10kb
abort_threshold: 1mb
local_read_size:
warn_threshold: 10kb
abort_threshold: 1mb
row_index_size:
warn_threshold: 100mb
abort_threshold: 1gb
 
 or should this be "flat"
 
 track_warnings_enabled: true
 track_warnings_coordinator_read_size_warn_threshold: 10kb
 track_warnings_coordinator_read_size_abort_threshold: 1mb
 track_warnings_local_read_size_warn_threshold: 10kb
 track_warnings_local_read_size_abort_threshold: 1mb
 track_warnings_row_index_size_warn_threshold: 100mb
 track_warnings_row_index_size_abort_threshold: 1gb
 
 For me I prefer nested for a few reasons
 * easier to enforce consistency as the configs can use shared types;
 in the track warnings patch I had mismatches cross configs (warn vs
 warns, fail vs abort, etc.) before going nested, now everything reuses
 the same types
 * even though it is longer, things can be more clear how they are
>> related
 * parsing layer can add support for mixed or purely flat depending on
 user preference (example:
 track_warnings.row_index_size.abort_threshold, using the '.' notation
 to represent nested structures)
 
 Thoughts?
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread Jacek Lewandowski
With the flat structure it turns into properties file - would it be
possible to support both formats - nested yaml and flat properties?


- - -- --- -  -
Jacek Lewandowski


On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe 
wrote:

> If it's nested, "track_warnings" would still work if you're grepping around
> vim or less.
>
> I'd have to concede the point about grep output, although there are tools
> like https://github.com/kislyuk/yq that could probably be bent to do what
> you want.
>
> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> > Hi David,
> >
> > while I do not oppose nested structure, it is really handy to grep
> > cassandra.yaml on some config key and you know the value instantly.
> > This is not possible when it is nested (easily & fastly) as it is on
> > two lines. Or maybe my grepping is just not advanced enough to cover
> > this case? If it is flat, I can just grep "track_warnings" and I have
> > them all.
> >
> > Can you elaborate on your last bullet point? Parsing layer ... What do
> > you mean specifically?
> >
> > Thanks
> >
> > On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
> > >
> > > This has been brought up in a few tickets, so pushing to the dev list.
> > >
> > > CASSANDRA-15234 - Standardise config and JVM parameters
> > > CASSANDRA-16896 - hard/soft limits for queries
> > > CASSANDRA-17147 - Guardrails prototype
> > >
> > > In short, do we as a project wish to move "new features" into nested
> > > YAML when the feature has "enough" to justify the nesting?  I would
> > > really like to focus this discussion on new features rather than
> > > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > > already a place to talk about that.
> > >
> > > To get things started, let's start with the track-warning feature
> > > (hard/soft limits for queries), currently the configs look as follows
> > > (assuming 15234)
> > >
> > > track_warnings:
> > > enabled: true
> > > coordinator_read_size:
> > > warn_threshold: 10kb
> > > abort_threshold: 1mb
> > > local_read_size:
> > > warn_threshold: 10kb
> > > abort_threshold: 1mb
> > > row_index_size:
> > > warn_threshold: 100mb
> > > abort_threshold: 1gb
> > >
> > > or should this be "flat"
> > >
> > > track_warnings_enabled: true
> > > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > > track_warnings_local_read_size_warn_threshold: 10kb
> > > track_warnings_local_read_size_abort_threshold: 1mb
> > > track_warnings_row_index_size_warn_threshold: 100mb
> > > track_warnings_row_index_size_abort_threshold: 1gb
> > >
> > > For me I prefer nested for a few reasons
> > > * easier to enforce consistency as the configs can use shared types;
> > > in the track warnings patch I had mismatches cross configs (warn vs
> > > warns, fail vs abort, etc.) before going nested, now everything reuses
> > > the same types
> > > * even though it is longer, things can be more clear how they are
> related
> > > * parsing layer can add support for mixed or purely flat depending on
> > > user preference (example:
> > > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > > to represent nested structures)
> > >
> > > Thoughts?
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread Caleb Rackliffe
I'm on record as early as the comments in CASSANDRA-15234 in support of
nesting, and I think the biggest reason is that the structure it forces on
our config makes it more cohesive and intelligible to those trying to
understand how major features and subsystems work together. It's very easy
to look at our current flat configuration and miss an option that modifies
or in some way governs another.

On the subject of mass-grepping via ssh, I would be careful. We have a
large and growing set of hot-properties, and looking at the YAML files
might not actually reflect how those nodes are currently configured.

On Fri, Nov 19, 2021 at 3:08 PM Caleb Rackliffe 
wrote:

> If it's nested, "track_warnings" would still work if you're grepping
> around vim or less.
>
> I'd have to concede the point about grep output, although there are tools
> like https://github.com/kislyuk/yq that could probably be bent to do what
> you want.
>
> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
>> Hi David,
>>
>> while I do not oppose nested structure, it is really handy to grep
>> cassandra.yaml on some config key and you know the value instantly.
>> This is not possible when it is nested (easily & fastly) as it is on
>> two lines. Or maybe my grepping is just not advanced enough to cover
>> this case? If it is flat, I can just grep "track_warnings" and I have
>> them all.
>>
>> Can you elaborate on your last bullet point? Parsing layer ... What do
>> you mean specifically?
>>
>> Thanks
>>
>> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
>> >
>> > This has been brought up in a few tickets, so pushing to the dev list.
>> >
>> > CASSANDRA-15234 - Standardise config and JVM parameters
>> > CASSANDRA-16896 - hard/soft limits for queries
>> > CASSANDRA-17147 - Guardrails prototype
>> >
>> > In short, do we as a project wish to move "new features" into nested
>> > YAML when the feature has "enough" to justify the nesting?  I would
>> > really like to focus this discussion on new features rather than
>> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
>> > already a place to talk about that.
>> >
>> > To get things started, let's start with the track-warning feature
>> > (hard/soft limits for queries), currently the configs look as follows
>> > (assuming 15234)
>> >
>> > track_warnings:
>> > enabled: true
>> > coordinator_read_size:
>> > warn_threshold: 10kb
>> > abort_threshold: 1mb
>> > local_read_size:
>> > warn_threshold: 10kb
>> > abort_threshold: 1mb
>> > row_index_size:
>> > warn_threshold: 100mb
>> > abort_threshold: 1gb
>> >
>> > or should this be "flat"
>> >
>> > track_warnings_enabled: true
>> > track_warnings_coordinator_read_size_warn_threshold: 10kb
>> > track_warnings_coordinator_read_size_abort_threshold: 1mb
>> > track_warnings_local_read_size_warn_threshold: 10kb
>> > track_warnings_local_read_size_abort_threshold: 1mb
>> > track_warnings_row_index_size_warn_threshold: 100mb
>> > track_warnings_row_index_size_abort_threshold: 1gb
>> >
>> > For me I prefer nested for a few reasons
>> > * easier to enforce consistency as the configs can use shared types;
>> > in the track warnings patch I had mismatches cross configs (warn vs
>> > warns, fail vs abort, etc.) before going nested, now everything reuses
>> > the same types
>> > * even though it is longer, things can be more clear how they are
>> related
>> > * parsing layer can add support for mixed or purely flat depending on
>> > user preference (example:
>> > track_warnings.row_index_size.abort_threshold, using the '.' notation
>> > to represent nested structures)
>> >
>> > Thoughts?
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>


Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread Caleb Rackliffe
If it's nested, "track_warnings" would still work if you're grepping around
vim or less.

I'd have to concede the point about grep output, although there are tools
like https://github.com/kislyuk/yq that could probably be bent to do what
you want.

On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> > enabled: true
> > coordinator_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > local_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > row_index_size:
> > warn_threshold: 100mb
> > abort_threshold: 1gb
> >
> > or should this be "flat"
> >
> > track_warnings_enabled: true
> > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > track_warnings_local_read_size_warn_threshold: 10kb
> > track_warnings_local_read_size_abort_threshold: 1mb
> > track_warnings_row_index_size_warn_threshold: 100mb
> > track_warnings_row_index_size_abort_threshold: 1gb
> >
> > For me I prefer nested for a few reasons
> > * easier to enforce consistency as the configs can use shared types;
> > in the track warnings patch I had mismatches cross configs (warn vs
> > warns, fail vs abort, etc.) before going nested, now everything reuses
> > the same types
> > * even though it is longer, things can be more clear how they are related
> > * parsing layer can add support for mixed or purely flat depending on
> > user preference (example:
> > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > to represent nested structures)
> >
> > Thoughts?
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread Bowen Song
I'm with Stefan. I prefer the flat YAML file which I can easily use grep 
to check and confirm the settings on large number of servers with 
parallel-ssh. This will be very hard to do on nested config in a YAML file.


In addition to that, I also use grep in the Cassandra source code to 
locate the relevant files based on the config name. The flat config name 
is long and unique, and this helps me efficiently navigate within the 
source code. I can imagine this is not going to work very well (if it 
works at all) with the nested config name.


p.s.: I'm not a Java developer, it will take me much longer to find the 
relevant code if grep doesn't work in the source code. It is also going 
to be harder for me to understand it if the nested config is turned into 
a Java object/class.


On 19/11/2021 19:07, Stefan Miklosovic wrote:

Hi David,

while I do not oppose nested structure, it is really handy to grep
cassandra.yaml on some config key and you know the value instantly.
This is not possible when it is nested (easily & fastly) as it is on
two lines. Or maybe my grepping is just not advanced enough to cover
this case? If it is flat, I can just grep "track_warnings" and I have
them all.

Can you elaborate on your last bullet point? Parsing layer ... What do
you mean specifically?

Thanks

On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:

This has been brought up in a few tickets, so pushing to the dev list.

CASSANDRA-15234 - Standardise config and JVM parameters
CASSANDRA-16896 - hard/soft limits for queries
CASSANDRA-17147 - Guardrails prototype

In short, do we as a project wish to move "new features" into nested
YAML when the feature has "enough" to justify the nesting?  I would
really like to focus this discussion on new features rather than
retroactively grouping (leaving that to CASSANDRA-15234), as there is
already a place to talk about that.

To get things started, let's start with the track-warning feature
(hard/soft limits for queries), currently the configs look as follows
(assuming 15234)

track_warnings:
 enabled: true
 coordinator_read_size:
 warn_threshold: 10kb
 abort_threshold: 1mb
 local_read_size:
 warn_threshold: 10kb
 abort_threshold: 1mb
 row_index_size:
 warn_threshold: 100mb
 abort_threshold: 1gb

or should this be "flat"

track_warnings_enabled: true
track_warnings_coordinator_read_size_warn_threshold: 10kb
track_warnings_coordinator_read_size_abort_threshold: 1mb
track_warnings_local_read_size_warn_threshold: 10kb
track_warnings_local_read_size_abort_threshold: 1mb
track_warnings_row_index_size_warn_threshold: 100mb
track_warnings_row_index_size_abort_threshold: 1gb

For me I prefer nested for a few reasons
* easier to enforce consistency as the configs can use shared types;
in the track warnings patch I had mismatches cross configs (warn vs
warns, fail vs abort, etc.) before going nested, now everything reuses
the same types
* even though it is longer, things can be more clear how they are related
* parsing layer can add support for mixed or purely flat depending on
user preference (example:
track_warnings.row_index_size.abort_threshold, using the '.' notation
to represent nested structures)

Thoughts?

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread David Capwell
In org.apache.cassandra.config.YamlConfigurationLoader (and anything working on 
translation of configs to flat structures), we can detect this pattern and 
recursively get the field (similar to walking directories); main change would 
be in 
org.apache.cassandra.config.YamlConfigurationLoader.PropertiesChecker#getProperty.
  The Property class acts like a Lens 
(https://hackage.haskell.org/package/lens), so can logically andThen them to 
build up the property; example

set(config, track_warnings.row_index_size.abort_threshold, 1gb) 

gets converted to

set(get(get(config, track_warnings), row_index_size), abort_thresold, 1gb)

This is an implementation detail so anything working with configs (yaml, 
vtable, jmx, etc.) have a consistent way of dealing with nested and flat 
configs.


> On Nov 19, 2021, at 11:07 AM, Stefan Miklosovic 
>  wrote:
> 
> Hi David,
> 
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
> 
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
> 
> Thanks
> 
> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
>> 
>> This has been brought up in a few tickets, so pushing to the dev list.
>> 
>> CASSANDRA-15234 - Standardise config and JVM parameters
>> CASSANDRA-16896 - hard/soft limits for queries
>> CASSANDRA-17147 - Guardrails prototype
>> 
>> In short, do we as a project wish to move "new features" into nested
>> YAML when the feature has "enough" to justify the nesting?  I would
>> really like to focus this discussion on new features rather than
>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>> already a place to talk about that.
>> 
>> To get things started, let's start with the track-warning feature
>> (hard/soft limits for queries), currently the configs look as follows
>> (assuming 15234)
>> 
>> track_warnings:
>>enabled: true
>>coordinator_read_size:
>>warn_threshold: 10kb
>>abort_threshold: 1mb
>>local_read_size:
>>warn_threshold: 10kb
>>abort_threshold: 1mb
>>row_index_size:
>>warn_threshold: 100mb
>>abort_threshold: 1gb
>> 
>> or should this be "flat"
>> 
>> track_warnings_enabled: true
>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>> track_warnings_local_read_size_warn_threshold: 10kb
>> track_warnings_local_read_size_abort_threshold: 1mb
>> track_warnings_row_index_size_warn_threshold: 100mb
>> track_warnings_row_index_size_abort_threshold: 1gb
>> 
>> For me I prefer nested for a few reasons
>> * easier to enforce consistency as the configs can use shared types;
>> in the track warnings patch I had mismatches cross configs (warn vs
>> warns, fail vs abort, etc.) before going nested, now everything reuses
>> the same types
>> * even though it is longer, things can be more clear how they are related
>> * parsing layer can add support for mixed or purely flat depending on
>> user preference (example:
>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>> to represent nested structures)
>> 
>> Thoughts?
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>