Re: How the write path finds the N nodes to write to?
> if the replication factor is 3 it just picks the other two nodes following > the ring clockwise. The coordinator for a given mutation is not necessarily a replica (depending on whether token-aware routing is used by the client) so it may have to forward to RF remote nodes, then wait for the required number of acknowledgements by the query's consistency level. For figuring out the replicas that a write should be forwarded to, see StorageProxy.performWrite: https://github.com/apache/cassandra/blob/999d263a5ddb9bb33981c39ede3125f199dd61ce/src/java/org/apache/cassandra/service/StorageProxy.java#L1359 Then there ReplicaPlan gets built based on the replication strategy for the keyspace, which includes the configuration of full / transient replicas: https://github.com/apache/cassandra/blob/999d263a5ddb9bb33981c39ede3125f199dd61ce/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L351 WriteResponseHandler is also used on the coordinator path when awaiting responses from replicas, to determine when to acknowledge a write back to a client. > On Aug 30, 2023, at 3:46 PM, Gabriel Giussi wrote: > > I know cassandra uses consistent hashing for choosing the node where a key > should go to, and if I understand correctly from this image > https://cassandra.apache.org/doc/latest/cassandra/_images/ring.svg > if the replication factor is 3 it just picks the other two nodes following > the ring clockwise. > I would like to know if someone can point me to where that is implemented, > because I want to implement something similar for the finagle http client, > the finagle library already > has some implementation of partitioning using consistent hashing, but it > doesn't support replication so a key only belongs to a single node, see > https://github.com/twitter/util/blob/develop/util-hashing/src/main/scala/com/twitter/hashing/ConsistentHashingDistributor.scala > . > > > Thanks.
How the write path finds the N nodes to write to?
I know cassandra uses consistent hashing for choosing the node where a key should go to, and if I understand correctly from this image https://cassandra.apache.org/doc/latest/cassandra/_images/ring.svg if the replication factor is 3 it just picks the other two nodes following the ring clockwise. I would like to know if someone can point me to where that is implemented, because I want to implement something similar for the finagle http client, the finagle library already has some implementation of partitioning using consistent hashing, but it doesn't support replication so a key only belongs to a single node, see https://github.com/twitter/util/blob/develop/util-hashing/src/main/scala/com/twitter/hashing/ConsistentHashingDistributor.scala . Thanks.
Re: Startup errors - 4.1.3
There are at least two bugs in the compaction lifecycle transaction log - one that can drop an ABORT / ADD in the wrong order (and prevent startup), and one that allows for invalid timestamps in the log file (and again, prevent startups). I believe it's safe to work around the former by removing the .log file, and you can work around the latter by using `touch` to update the timestamps of the data file that mismatches, but I can't find the relevant JIRAs to be 100% sure. (Also, it may be a good trigger to cut a new release, because things that block startup are obviously quite serious). On Wed, Aug 30, 2023 at 6:59 AM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK. I > started to see a lot of timeout errors, and discovered one of the nodes > had this message constantly repeated: > "waiting to acquire a permit to begin streaming" - so perhaps I hit this > bug: > https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html > > I then restarted that node, but it gave a bunch of errors about > "unexpected disk state: failed to read translation log" > I deleted the corresponding files and got that node to come up, but now > when I restart any of the other nodes in the cluster, they too do not > start back up: > > Example: > > INFO [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifying > logfile transaction > [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in > /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, > > > /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] > ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatched > line in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got > 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]' > > expected > 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]', > > giving up > ERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to read > records for transaction log > [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in > /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, > > > /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] > ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 - > Unexpected disk state: failed to read transaction log > [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in > /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, > > > /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] > Files and contents follow: > > /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352] > ABORT:[,0,0][737437348] > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752] > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803] > > ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075] > > ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253] > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718] > > /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752] > ***Does not match > > > in first replica file > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803] > > ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075] > > ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253] > > ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718] > > ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannot > remove temporary or obsoleted files for doc.extractedmetadata due to a > problem with transaction log files. Please check records with problems > in the log messages above and fix them. Refer to the 3.0 upgrading > instructions in NEWS.txt for a description of transaction log files. > > I then delete the files and eventually after many iterations, the node > comes back up. > The table 'extractedmetadata' has 29 billion records. Just a data point > here - I think the
Startup errors - 4.1.3
Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK. I started to see a lot of timeout errors, and discovered one of the nodes had this message constantly repeated: "waiting to acquire a permit to begin streaming" - so perhaps I hit this bug: https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html I then restarted that node, but it gave a bunch of errors about "unexpected disk state: failed to read translation log" I deleted the corresponding files and got that node to come up, but now when I restart any of the other nodes in the cluster, they too do not start back up: Example: INFO [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifying logfile transaction [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatched line in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]' expected 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]', giving up ERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to read records for transaction log [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 - Unexpected disk state: failed to read transaction log [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3] Files and contents follow: /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352] ABORT:[,0,0][737437348] ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752] ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803] ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075] ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253] ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718] /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752] ***Does not match in first replica file ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803] ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075] ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253] ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718] ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannot remove temporary or obsoleted files for doc.extractedmetadata due to a problem with transaction log files. Please check records with problems in the log messages above and fix them. Refer to the 3.0 upgrading instructions in NEWS.txt for a description of transaction log files. I then delete the files and eventually after many iterations, the node comes back up. The table 'extractedmetadata' has 29 billion records. Just a data point here - I think the 'right' thing to do is just to go to each node and stop it, clean up the files, and finally get each one back up? -Joe -- This email has been checked for viruses by AVG antivirus software. www.avg.com