Re: How the write path finds the N nodes to write to?

2023-08-30 Thread Abe Ratnofsky
> if the replication factor is 3 it just picks the other two nodes following 
> the ring clockwise.

The coordinator for a given mutation is not necessarily a replica (depending on 
whether token-aware routing is used by the client) so it may have to forward to 
RF remote nodes, then wait for the required number of acknowledgements by the 
query's consistency level.

For figuring out the replicas that a write should be forwarded to, see 
StorageProxy.performWrite: 
https://github.com/apache/cassandra/blob/999d263a5ddb9bb33981c39ede3125f199dd61ce/src/java/org/apache/cassandra/service/StorageProxy.java#L1359

Then there ReplicaPlan gets built based on the replication strategy for the 
keyspace, which includes the configuration of full / transient replicas:
https://github.com/apache/cassandra/blob/999d263a5ddb9bb33981c39ede3125f199dd61ce/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L351

WriteResponseHandler is also used on the coordinator path when awaiting 
responses from replicas, to determine when to acknowledge a write back to a 
client.

> On Aug 30, 2023, at 3:46 PM, Gabriel Giussi  wrote:
> 
> I know cassandra uses consistent hashing for choosing the node where a key 
> should go to, and if I understand correctly from this image 
> https://cassandra.apache.org/doc/latest/cassandra/_images/ring.svg 
> if the replication factor is 3 it just picks the other two nodes following 
> the ring clockwise.
> I would like to know if someone can point me to where that is implemented, 
> because I want to implement something similar for the finagle http client, 
> the finagle library already 
> has some implementation of partitioning using consistent hashing, but it 
> doesn't support replication so a key only belongs to a single node, see 
> https://github.com/twitter/util/blob/develop/util-hashing/src/main/scala/com/twitter/hashing/ConsistentHashingDistributor.scala
>  .
> 
> 
> Thanks.



How the write path finds the N nodes to write to?

2023-08-30 Thread Gabriel Giussi
I know cassandra uses consistent hashing for choosing the node where a key
should go to, and if I understand correctly from this image
https://cassandra.apache.org/doc/latest/cassandra/_images/ring.svg
if the replication factor is 3 it just picks the other two nodes following
the ring clockwise.
I would like to know if someone can point me to where that is implemented,
because I want to implement something similar for the finagle http client,
the finagle library already
has some implementation of partitioning using consistent hashing, but it
doesn't support replication so a key only belongs to a single node, see
https://github.com/twitter/util/blob/develop/util-hashing/src/main/scala/com/twitter/hashing/ConsistentHashingDistributor.scala
 .


Thanks.


Re: Startup errors - 4.1.3

2023-08-30 Thread Jeff Jirsa
There are at least two bugs in the compaction lifecycle transaction log -
one that can drop an ABORT / ADD in the wrong order (and prevent startup),
and one that allows for invalid timestamps in the log file (and again,
prevent startups).

 I believe it's safe to work around the former by removing the .log file,
and you can work around the latter by using `touch` to update the
timestamps of the data file that mismatches, but I can't find the relevant
JIRAs to be 100% sure.

(Also, it may be a good trigger to cut a new release, because things that
block startup are obviously quite serious).




On Wed, Aug 30, 2023 at 6:59 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK.  I
> started to see a lot of timeout errors, and discovered one of the nodes
> had this message constantly repeated:
> "waiting to acquire a permit to begin streaming" - so perhaps I hit this
> bug:
> https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html
>
> I then restarted that node, but it gave a bunch of errors about
> "unexpected disk state: failed to read translation log"
> I deleted the corresponding files and got that node to come up, but now
> when I restart any of the other nodes in the cluster, they too do not
> start back up:
>
> Example:
>
> INFO  [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifying
> logfile transaction
> [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in
> /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,
>
>
> /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
> ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatched
> line in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got
> 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]'
>
> expected
> 'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]',
>
> giving up
> ERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to read
> records for transaction log
> [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in
> /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,
>
>
> /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
> ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 -
> Unexpected disk state: failed to read transaction log
> [nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in
> /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,
>
>
> /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
> Files and contents follow:
>
> /data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]
>  ABORT:[,0,0][737437348]
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
>
> ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
>
> ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]
>
> /data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
>  ***Does not match
> 
>
> in first replica file
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
>
> ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
>
> ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
>
> ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]
>
> ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannot
> remove temporary or obsoleted files for doc.extractedmetadata due to a
> problem with transaction log files. Please check records with problems
> in the log messages above and fix them. Refer to the 3.0 upgrading
> instructions in NEWS.txt for a description of transaction log files.
>
> I then delete the files and eventually after many iterations, the node
> comes back up.
> The table 'extractedmetadata' has 29 billion records.  Just a data point
> here - I think the

Startup errors - 4.1.3

2023-08-30 Thread Joe Obernberger
Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK.  I 
started to see a lot of timeout errors, and discovered one of the nodes 
had this message constantly repeated:
"waiting to acquire a permit to begin streaming" - so perhaps I hit this 
bug:

https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html

I then restarted that node, but it gave a bunch of errors about 
"unexpected disk state: failed to read translation log"
I deleted the corresponding files and got that node to come up, but now 
when I restart any of the other nodes in the cluster, they too do not 
start back up:


Example:

INFO  [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifying 
logfile transaction 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatched 
line in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got 
'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]' 
expected 
'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]', 
giving up
ERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to read 
records for transaction log 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 - 
Unexpected disk state: failed to read transaction log 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]

Files and contents follow:
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]
    ABORT:[,0,0][737437348]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
    ***Does not match 
 
in first replica file

ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]

ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannot 
remove temporary or obsoleted files for doc.extractedmetadata due to a 
problem with transaction log files. Please check records with problems 
in the log messages above and fix them. Refer to the 3.0 upgrading 
instructions in NEWS.txt for a description of transaction log files.


I then delete the files and eventually after many iterations, the node 
comes back up.
The table 'extractedmetadata' has 29 billion records.  Just a data point 
here - I think the 'right' thing to do is just to go to each node and 
stop it, clean up the files, and finally get each one back up?


-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com