Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Bowen Song
I'm worried that by the time a consensus is reached, the people who originally purposed the CEP may have long lost their passion about it and may no longer willing to contribute. On 15/10/2021 16:55, Benjamin Lerer wrote: Reaching consensus is hard but we will get there :-) Le ven. 15 oct.

[DISCUSS] Disabling MIME-part filtering on this mailing list

2021-12-04 Thread Bowen Song
Hello, Currently this mailing list has MIME-part filtering turned on, which will results in "From:" address munging (appending ".INVALID" to the sender's email address) for domains enforcing strict DMARC rules, such as apple.com, zoho.com and all Yahoo.** domains. This behaviour may cause

Re: [DISCUSS] Disabling MIME-part filtering on this mailing list

2021-12-04 Thread Bowen Song
Hmm.. It's too late to change that. I opened it as "DISCUSS" because I was not sure if the information in it is enough for people to vote on it. There's clearly a lot more can be asked or discussed. For example the change will also stop the "To unsubscribe, ..." footer being appended to some

Re: [DISCUSS] Disabling MIME-part filtering on this mailing list

2021-12-21 Thread Bowen Song
I have just received a confirmation from Infra informing me that this change has been made. I'm sending this email as an update but also a test. Hopefully it arrives in your inbox without trouble, and my email address no longer has the ".INVALID" append to it. On 04/12/2021 17:15,

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Bowen Song
Sorry, but IMHO setting performance requirements on this regard is a nonsense. As long as it's reasonably usable in real world, and Cassandra makes the estimated effects on performance available, it will be up to the operators to decide whether to turn on the feature. It's a trade off between

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Bowen Song
On the performance note, I copy & pasted a small piece of Java code to do AES256-CBC on the stdin and write the result to stdout. I then ran the following two commands on the same machine (with AES-NI) for comparison: $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time

Re: [DISCUSS] Nested YAML configs for new features

2021-11-19 Thread Bowen Song
I'm with Stefan. I prefer the flat YAML file which I can easily use grep to check and confirm the settings on large number of servers with parallel-ssh. This will be very hard to do on nested config in a YAML file. In addition to that, I also use grep in the Cassandra source code to locate

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-15 Thread Bowen Song
The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables)

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
) than users that do not encrypt their data at rest. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
ould this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many enc

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Bowen Song
I don't like the idea that FDE Full Disk Encryption as an alternative to application managed encryption at rest. Each has their own advantages and disadvantages. For example, if the encryption key is the same across nodes in the same cluster, and Cassandra can share the key securely between

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Bowen Song
It only works if the output is for human to read. If you have a large number of servers, very often you want to do "grep -q ... && other_command" (or || other_command), or chaining the grep results frin parallel-ssh into another command (grep or sort). The -A/-B/-C switches will not work in

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Bowen Song
Since you mentioned ElasticSearch, I'm actually pretty happy with their config file syntax. It allows the user to completely flatten out the entire config file. To give people who isn't familiar with ElasticSearch an idea, here is a config file we use: cluster.name: foobar

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Bowen Song
be aiming for it to be compatible with a CQL representation also. From: Bowen Song Date: Wednesday, 24 November 2021 at 18:15 To:dev@cassandra.apache.org Subject: Re: [DISCUSS] Nested YAML configs for new features Since you mentioned ElasticSearch, I'm actually pretty happy with their config file

Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Bowen Song
Personally, I would prefer a transition period in which the new feature is not enabled by default. This not only makes version upgrading easier, it also allows the user to stay on the old behaviour if they experience any issue with the new feature (e.g.: bugs in the new feature, or edge use

Re: [DISCUSS] How to implement backward compatibility (CASSANDRA-17048)

2021-10-26 Thread Bowen Song
it is enabled, the user is unable to revert it. - - -- --- - - Jacek Lewandowski On Tue, Oct 26, 2021 at 12:54 PM Bowen Song wrote: Personally, I would prefer a transition period in which the new feature is not enabled by default. This not only makes version upgrading easier

Re: [DISCUSS] Creating a new slack channel for newcomers

2021-11-09 Thread Bowen Song
As a newcomer (made two commits since October) who has been watching this mailing list since then, I don't like the idea of a separate channel for beginner questions. The volume in this mailing list is fairly low, I can't see any legitimate reason for diverting a portion of that into another

Re: Issue while trying to run pytest command

2022-01-10 Thread Bowen Song
Did you run the pytest command in the cassandra directory (the cassandra git repo) or the cassandra-dtest directory (the cassandra-dtest git repo)? You should run the pytest command in the cassandra-dtest. On 09/01/2022 11:33, Manish G wrote: Initial installation is done following

Re: Updating our Code Contribution/Style Guide

2022-03-14 Thread Bowen Song
I found there's no mentioning of Python code style at all. If we are going to update the style guide, can this be addressed too? FYI, a quick "flake8" style check shows many existing issues in the Python code, including libraries imported but unused, redefinition of unused imports and invalid

Re: Updating our Code Contribution/Style Guide

2022-03-14 Thread Bowen Song
13 + a lot of improvements around Python stuff are coming. If you identify more places for improvements we are definitely interested. Regards On Mon, 14 Mar 2022 at 11:53, Bowen Song wrote: I found there's no mentioning of Python code style at all. If we are going to update the style guide, can this be

Re: Updating our Code Contribution/Style Guide

2022-03-14 Thread Bowen Song
placed to do so, having chosen throughout my career to limit my exposure to python. Probably a parallel effort would be best - perhaps you could work with Stefan and others to produce such a proposal? *From: *Bowen Song *Date: *Monday, 14 March 2022 at 10:53 *To: *dev@cassandra.apache.org

Re: Client password hashing

2022-02-16 Thread Bowen Song
To me this doesn't sound very useful. Here's a few threat model I can think of that may be related to this proposal, and why is this not addressing the issues & what should be done instead. 1. passwords are send over network in plaintext allows passive packet sniffier to learn about the

Re: [DISCUSS] CASSANDRA-17292 Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Bowen Song
I agree with Benedict, there's legit use cases for both the flat and structured config file format. The operator should be able to choose which one is best suited for their own use case. It will also make the upgrade process easier if both formats are supported by future versions of Cassandra.

Re: [DISCUSS] CASSANDRA-17292 Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Bowen Song
then have some kind of converter from the old Config object to the new object model that allows us to provide values to DatabaseDescriptor from only the new one (thereby avoiding any changes to the places all over the codebase that use DD). On Wed, Feb 23, 2022 at 4:46 AM Bowen Song wrote: I

Re: Dropping Python 3.6 support in 4.1

2022-04-05 Thread Bowen Song
I'm against this change. CentOS 7 only has Python up to 3.6 available from the EPEL repository, and the maintenance updates for CentOS 7 ends in 2024. See: https://wiki.centos.org/About/Product To install Python>3.6 on CentOS 7, the user must either use a 3rd party repository that's not

Re: [DISCUSS] CEP-19: Trie memtable implementation

2022-02-09 Thread Bowen Song
TBH, I don't have an opinion on the configuration. I just want to say that if at the end we decide the configuration in the YAML should override the table schema, I would like to recommend that we specifying a list of whitelisted (or blacklisted) "templates" in the YAML file, and the template

Re: [DISCUSS] Improve Commitlog write path

2022-07-20 Thread Bowen Song via dev
From my past experience, the bottleneck for insert heavy workload is likely to be compaction, not commit log. You initially may see commit log as the bottleneck when the table size is relatively small, but as the table size increases, compaction will likely take its place and become the new

Re: [DISCUSS] Improve Commitlog write path

2022-07-26 Thread Bowen Song via dev
long. With lower throughput large system can ingest more data. Does it make sense ? Thanks, Amit *From:* Bowen Song via dev *Sent:* Friday, July 22, 2022 4:37 PM *To:* dev@cassandra.apache.org *Subject:* Re: [DISCUSS] Improve Commitlog write path [CAUTION: External Email] Hi Amit

Re: [DISCUSS] Improve Commitlog write path

2022-07-22 Thread Bowen Song via dev
ti-threading is good to have now ? else please suggest if I need to test further. Thanks, Amit *From:* Bowen Song via dev *Sent:* Wednesday, July 20, 2022 4:13 PM *To:* dev@cassandra.apache.org *Subject:* Re: [DISCUSS] Improve Commitlog write path [CAUTION: External Email] From my past experi

Re: [DISCUSS] Deprecate and remove resumable bootstrap and decommission

2022-08-03 Thread Bowen Song via dev
I have benefited from the resumable bootstrap before, and I'm in favour of keeping the feature around. I've had streaming failures due to long STW GC pauses on some bootstrapping nodes, and I had to resume the bootstrap once or twice in order to get these nodes finish joinning the cluster.

Re: [DISCUSS] Deprecate and remove resumable bootstrap and decommission

2022-08-03 Thread Bowen Song via dev
these nodes finish joinning the cluster. Was this before or after the addition of zero copy streaming? The premise is that the pain point resumable bootstrap targets is mitigated by the much faster bootstrapping times without the correctness risks. On Wed, Aug 3, 2022, at 6:21 PM, Bowen Song via dev

Re: [DISCUSS] Deprecate and remove resumable bootstrap and decommission

2022-08-03 Thread Bowen Song via dev
:11, Jeff Jirsa wrote: The hypothetical concern described is around potential data resurrection - would you still use resumable bootstrap if you knew that data deleted during those STW pauses was improperly resurrected? On Wed, Aug 3, 2022 at 2:40 PM Bowen Song via dev wrote: I have

Re: [DISCUSS] Deprecate and remove resumable bootstrap and decommission

2022-08-03 Thread Bowen Song via dev
wide token range outside the receiving node's desired token range. On 04/08/2022 00:41, Bowen Song wrote: That was Cassandra 3.11, before the introduction of zero copy. But I must say I'm not certain whether the new zero copy streaming can prevent the long GC pauses, as I haven't tried it. On

Re: [PROPOSAL] Moving deb/rpm repositories from downloads.apache.org to apache.jfrog.io

2022-08-11 Thread Bowen Song via dev
of (superfluous) signing on top of that, which we do not currently have. Kind Regards, Brandon On Thu, Aug 11, 2022 at 4:20 PM Bowen Song via dev wrote: In that case, the move from signed RPM/DEB to unsigned can be quiet problematic to some enterprise users. On 11/08/2022 22:16, Jeremiah D

Re: [PROPOSAL] Moving deb/rpm repositories from downloads.apache.org to apache.jfrog.io

2022-08-11 Thread Bowen Song via dev
I'm a bit unclear what's the scope of this change. Is it limited to the "*-bin.tar.gz" files only? I would assume the RPM/DEB packages are considered as parts of the "official releases", and aren't affected by this change. Am I right? On 11/08/2022 21:59, Mick Semb Wever wrote: >

Re: [PROPOSAL] Moving deb/rpm repositories from downloads.apache.org to apache.jfrog.io

2022-08-11 Thread Bowen Song via dev
.  See the ASF release policy for more information. https://www.apache.org/legal/release-policy.html#compiled-packages On Aug 11, 2022, at 4:12 PM, Bowen Song via dev wrote: I'm a bit unclear what's the scope of this change. Is it limited to the "*-bin.tar.gz" files only? I wo

Re: [PROPOSAL] Moving deb/rpm repositories from downloads.apache.org to apache.jfrog.io

2022-08-11 Thread Bowen Song via dev
I see. In that case, stick to the original plan makes more sense. On 11/08/2022 22:46, Mick Semb Wever wrote: We should have the new domain/URL created before the final move is made, and redirecting to the existing download.apache.org for the time

Re: [PROPOSAL] Moving deb/rpm repositories from downloads.apache.org to apache.jfrog.io

2022-08-11 Thread Bowen Song via dev
> /These repositories and their binaries are "convenience binaries" and not the official Cassandra source binaries/ Then where are the official binaries? On 11/08/2022 21:40, Mick Semb Wever wrote: The proposal is to move our official debian and redhat repositories from

Re: Unsubscribe

2022-08-09 Thread Bowen Song via dev
To unsubscribe from this mailing list, you'll need to send an email to dev-unsubscr...@cassandra.apache.org On 09/08/2022 12:52, Schmidtberger, Brian M. (STL) wrote: unsubscribe + BRIAN SCHMIDTBERGER Software Engineering Senior Advisor, Core Engineering, Express Scripts M: 785.766.7450

[DISCUSS] Enhanced Disk Error Handling

2023-03-08 Thread Bowen Song via dev
At the moment, when a read error, such as unrecoverable bit error or data corruption, occurs in the SSTable data files, regardless of the disk_failure_policy configuration, manual (or to be precise, external) intervention is required to recover from the error. Commonly, there's two approach

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Bowen Song via dev
range happens. But that feels suboptimal to me when a better framework is on the horizon. -- Abe On Mar 9, 2023, at 8:23 AM, Bowen Song via dev wrote: Hi Jeremiah, I'm fully aware of that, which is why I said that deleting the affected SSTable files is "less safe". If the "b

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Bowen Song via dev
ens. But that feels suboptimal to me when a better framework is on the horizon. -- Abe On Mar 9, 2023, at 8:23 AM, Bowen Song via dev wrote: Hi Jeremiah, I'm fully aware of that, which is why I said that deleting the affected SSTable files is "less safe". If the "bad blocks

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread Bowen Song via dev
The documented command options are: nodetool tablehistograms [ | ] That means one parameter will be treated as dot separated keyspace and table. Alternatively, two parameters will be treated as the keyspace and table respectively. To remain compatible with the documented behaviour, my

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-23 Thread Bowen Song via dev
zie 于2023年3月22日周三 23:35写道: Agree w/Bowen. I think the straight forward simplicity of "clear inclusion and exclusion semantics, default to include all in scope excepting things that are explicitly ignored" would be ideal. On Wed, Mar 22, 2023, at 8:45 AM, Bowen Song via de

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-23 Thread Bowen Song via dev
 ,it is simple for him to type ten times with different table names which I think at first Only set with argument ks keyspace name is enough. When we just want to see eight tables in the ks ,the user should just type eight table name which ignore two table may be enough. Bowen Song via dev 于2023年3月23日 周

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-08 Thread Bowen Song via dev
ct a drive reporting uncorrectable errors / filesystem corruption to be long for this world. Can you say more about the scenarios you have in mind? – Scott On Mar 8, 2023, at 5:24 AM, Bowen Song via dev wrote: At the moment, when a read error, such as unrecoverable bit error or data corrupti

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-22 Thread Bowen Song via dev
also consider augmenting the tool with new named arguments with the functionality you described and leave the positional usage intact. On Thu, Mar 16, 2023, at 6:43 AM, Bowen Song via dev wrote: The documented command options are: nodetool tablehistograms [ | ]

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Bowen Song via dev
covering).  Then you can stream from the other nodes to get the data back. -Jeremiah On Mar 8, 2023, at 7:24 AM, Bowen Song via dev wrote: At the moment, when a read error, such as unrecoverable bit error or data corruption, occurs in the SSTable data files, regardless of the disk_failure_

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Bowen Song via dev
/> I'm quite happy to leave things as they are if that is the consensus./ +1 to the above On 06/04/2023 14:54, Mike Adamson wrote: My apologies. I started this discussion off the back of a usability discussion around new user accessibility to Cassandra and the premise that there is an

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-04 Thread Bowen Song via dev
I personally prefer to use the name "keyspace", because it avoids the confusion between the "database software/server" and the "collection of tables in a database". "An SQL database" can mean different things in different contexts, but "a Cassandra keyspace" always mean the same thing. On

Re: [DISCUSS] Maintain backwards compatibility after dependency upgrade in the 5.0

2023-06-28 Thread Bowen Song via dev
IMHO, anyone upgrading software between major versions should expect to see breaking changes. Introducing breaking or major changes is the whole point of bumping major version numbers. Since the library upgrade need to happen sooner or later, I don't see any reason why it should not happen in

Re: [DISCUSS] Add subscription mangement instructions to user@, dev@ message footers

2024-01-22 Thread Bowen Song via dev
Adding a footer or modifying the email content in any way will break the DKIM signature of the email if it has one. Since the mailing list's mail server will forward the emails to the recipients, the SPF check will fail too. Failing the DKIM signature & SPF check will result in the email

Re: [DISCUSS] Add subscription mangement instructions to user@, dev@ message footers

2024-01-22 Thread Bowen Song via dev
at's not forwarding, but sending a new email with the original email's content, subject, sender name (but not address), etc. information copied over. I believe the mailing list software this mailing list is using also supports such feature. For example, this email's "From" address is

Re: Table name length limit in Cassandra

2024-02-22 Thread Bowen Song via dev
Hi Gaurav, I would be less worried about performance issues than interoperability issues. Other tools/client libraries do not expect this, and may cause them to behave unexpectedly (e.g. truncating/crashing/...). If you can, try get rid of common prefix/suffix, and use abbreviations where

Re: [DISCUSS] New CQL command/option for listing roles with superuser privileges

2024-02-29 Thread Bowen Song via dev
I believe that opens the door to this kind of situations: 1. create superuser role "role1" 2. create superuser role "role2" 3. add "role2" to members of "role1" 4. remove "role2" from the members of "role1" 5. "role2" now inexplicitly lost the superuser state TBH, my preferred solution is

Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-07 Thread Bowen Song via dev
parate columns on the same table, and none of this matters. If they are mixed, it feels like we should at least have the option to make them comparable, kind of like we have the option to make text case-insensitive or unicode normalized right now. On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev

Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-06 Thread Bowen Song via dev
Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0::7f00:0001 (IPv6), but their values are equal. Just like 1.0 (double) is not 1 (int), but their values are equal. So, what is the meaning of "=" in CQL? On 06/03/2024 21:36, David Capwell wrote: So, was reviewing SAI and found we convert ipv4

Re: Default table compression defined in yaml.

2024-03-19 Thread Bowen Song via dev
I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is deprecated, and the new format is `foobar: 123KiB`. Is there a need to introduce new settings entries with the deprecated format only to be removed at a later version? On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:

Re: Schema Disagreement Issue for Cassandra 4.1

2024-04-01 Thread Bowen Song via dev
It sounds worthy of a Jira ticket. On 01/04/2024 06:23, Cheng Wang via dev wrote: Hello, I have recently encountered a problem concerning schema disagreement in Cassandra 4.1. It appears that the schema versions do not reconcile as expected. The issue can be reproduced by following these