Re: [DISCUSS] CASSANDRA-16767, CASSANDRA-16768, and CASSANDRA-16769 for 3.11.x

2021-08-10 Thread bened...@apache.org
Hi Scott,

I wonder if it’s possible that too few people who saw your email consider 
themselves sufficiently involved in this part of the codebase.  People tend to 
keep quiet about stuff they don’t participate in deeply, which is why I haven’t 
responded – and I wonder if this might explain the tumbleweed. I wonder how we 
might generally track if areas of the codebase are adequately covered by active 
contributors.

To answer your question, I don’t personally believe it is problematic to add 
additional features to command line tools in a patch version – they’re not 
scary systems where new features introduce much risk of high impact bugs. 
Others have stricter interpretations of the rules, but if they haven’t spoken 
up yet I’d say you’re clear to post some patches – but you might want to first 
make sure there’s somebody able and willing to review them.


From: Scott Carey 
Date: Monday, 9 August 2021 at 20:12
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] CASSANDRA-16767, CASSANDRA-16768, and CASSANDRA-16769 
for 3.11.x
Thank you Brandon, for answering my questions on slack, and providing early
feedback on these ideas more than a month before I created the patches and
replying here.

Does anyone else have any comments or opinions?  Can a decision be reached
one way or another?  It is my understanding that we'll need more than one
+1 to move forward here.

I understand that the 4.0 release was a busy time, and that many probably
saw this, thought about replying, but got too busy and never did.
However, in light of the recent discussions around attracting new
contributors, I would like to highlight that being left in limbo with no
resolution is worse than being told "no", especially for new contributors.




On Fri, Jul 2, 2021 at 1:23 PM Brandon Williams  wrote:

> On Tue, Jun 29, 2021 at 5:49 PM Scott Carey  wrote:
> >
> > I'd like to discuss the inclusion of the above tickets for a 3.11.x
> > release.  These are not a pure 'bug fix' so I'll need a waiver to get
> them
> > into 3.11.x  (and implicitly, 4.0.x).
> >
> > The first two are straightforward oversights:  neither *nodetool
> > garbagecollect *nor *nodetool scrub* currently accept a *--user-defined*
> > parameter list of SSTables in the same way that *nodetool compact* does.
> >
> > This is an operational problem for large tables.
> >
> > I often need to scrub just one file that is corrupted for some reason,
> and
> > not scrub an entire 1TB+ of data for a table on a node.  This renders
> > 'nodetool scrub' operationally useless for large tables.
>
> I think that given not having user defined options for these
> compaction types is clearly an oversight, and that the alternative of
> deleting the large 1TB+ sstable and then repairing is a cure worse
> than the disease, this should be added to 3.11.x and 4.0.x. I am +1
> here.
>
> > For *garbagecollect* it is often operationally easy to identify which
> > tables are likely to be full of bloa- and operationally useful to do this
> > task in small increments.  The existing order that garbagecollect
> processes
> > SSTables prevents it from being useful in any incremental fashion -- if
> you
> > stop it and later restart, it will first process the SSTables you just
> > garbage collected.
> >
> > The third ticket adds an option for* nodetool garbagecollect*,
> > *--oldest-fraction* that can select a fraction of the oldest table data
> in
> > bytes, and garbagecollect only the SSTables that 'cover' that percentage
> of
> > data.  Operationally, this lends itself to easy automation -- for example
> > running this once a week on 10% of a table's data would imply that there
> is
> > no data on disk that has been overwritten within the last 10 weeks.  This
> > caps data bloat in ways neither LCS nor STCS can currently achieve
> without
> > regular major compactions or full-pass garbagecollect.
>
> This is a less obvious thing to be added, and I personally lack the
> operational experience to comment on how much relief this would
> provide firsthand, so I'll leave that to others.  But it does make
> sense to me and since it isn't heavily modifying anything my
> inclination is that this could be an acceptable addition as well.
>
> Kind Regards,
> Brandon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] CASSANDRA-16840 - Close native transport port before hint transfer during decommission

2021-08-10 Thread Jeff Jirsa
The hint behavior aside, stopping native protocol once you begin a decom
seems like something most people would benefit from, even if they dont
realize that's what they want to happen.



On Tue, Aug 10, 2021 at 12:53 PM Matt Fleming 
wrote:

> Hi,
>
> With the way that hint transfer currently works during decommission,
> it's possible to leave hints on the disk of the decommissioning node if
> those hints are generated after the decommission begins.
>
> I'd like to discuss automatically closing the native transport port
> before hints are transferred to peers during the decommission process to
> prevent this from happening.
>
> While it's possible to manually prevent this problem today if an
> operator runs "nodetool disablebinary" before starting the decom, I
> think the default behaviour is surprising enough that doing it
> automatically makes more sense.
>
> Thanks,
> Matt
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


[DISCUSS] CASSANDRA-16840 - Close native transport port before hint transfer during decommission

2021-08-10 Thread Matt Fleming
Hi,

With the way that hint transfer currently works during decommission,
it's possible to leave hints on the disk of the decommissioning node if
those hints are generated after the decommission begins.

I'd like to discuss automatically closing the native transport port
before hints are transferred to peers during the decommission process to
prevent this from happening.

While it's possible to manually prevent this problem today if an
operator runs "nodetool disablebinary" before starting the decom, I
think the default behaviour is surprising enough that doing it
automatically makes more sense.

Thanks,
Matt

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org