Re: Cassandra 5.0 Beta1 - vector searching results

2024-03-27 Thread Joe Obernberger

Thank you all for the details on this.
For your #1 - if there are going to be 100+ million vectors, wouldn't I 
want the search to go across nodes?


Right now, we're running both weaviate (8 node cluster), our main 
cassandra 4 cluster (12 nodes), and a test 3 node cassandra 5 cluster.  
Weaviate does some interesting things like product quantization to 
reduce size and improve search speed.  They get amazing speed, but the 
drawback is, from what I can tell, they load the entire index into RAM.  
We've been having a reoccurring issue where once it runs out of RAM, it 
doesn't get slow; it just stops working.  Weaviate enables some powerful 
vector+boolean+range queries.  I would love to only have one database!


I'll look into how to do profiling - the terms you use are things I'm 
not familiar with, but I've got chatGPT and google... :)


-Joe

On 3/21/2024 10:51 PM, Caleb Rackliffe wrote:
To expand on Jonathan’s response, the best way to get SAI to perform 
on the read side is to use it as a tool for large-partition search. In 
other words, if you can model your data such that your queries will be 
restricted to a single partition, two things will happen…


1.) With all queries (not just ANN queries), you will only hit as many 
nodes as your read consistency level and replication factor require. 
For vector searches, that means you should only hit one node, and it 
should be the coordinating node w/ a properly configured, token-aware 
client.


2.) You can use LCS (or UCS configured to mimic LCS) instead of STCS 
as your table compaction strategy. This will essentially guarantee 
your (partition-restricted) SAI query hits a small number of 
SSTable-attached indexes. (It’ll hit Memtable-attached indexes as well 
for any recently added data, so if you’re seeing latencies shoot up, 
it’s possible there could be contention on the Memtable-attached index 
that supports ANN queries. I haven’t done a deep dive on it. You can 
always flush Memtables directly before queries to factor that out.)


If you can do all of the above, the simple performance of the local 
index query and its post-filtering reads is probably the place to 
explore further. If you manage to collect any profiling data (JFR, 
flamegraphs via async-profiler, etc) I’d be happy to dig into it with you.


Thanks for kicking the tires!

On Mar 21, 2024, at 8:20 PM, Brebner, Paul via user 
 wrote:




Hi Joe,

Have you considered submitting something for Community Over Code NA 
2024? The CFP is still open for a few more weeks, options could be my 
Performance Engineering track or the Cassandra track – or both 


https://www.linkedin.com/pulse/cfp-community-over-code-na-denver-2024-performance-track-paul-brebner-nagmc/?trackingId=PlmmMjMeQby0Mozq8cnIpA%3D%3D

Regards, Paul Brebner

*From: *Joe Obernberger 
*Date: *Friday, 22 March 2024 at 3:19 am
*To: *user@cassandra.apache.org 
*Subject: *Cassandra 5.0 Beta1 - vector searching results

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments




Hi All - I'd like to share some initial results for the vector search on
Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp
storage.

Have a table (doc.embeddings_googleflan5tlarge) with definition:

CREATE TABLE doc.embeddings_googleflant5large (
 uuid text,
 type text,
 fieldname text,
 offset int,
 sourceurl text,
 textdata text,
 creationdate timestamp,
 embeddings vector,
 metadata boolean,
 source text,
 PRIMARY KEY ((uuid, type), fieldname, offset, sourceurl, textdata)
) WITH CLUSTERING ORDER BY (fieldname ASC, offset ASC, sourceurl ASC,
textdata ASC)
 AND additional_write_policy = '99p'
 AND allow_auto_snapshot = true
 AND bloom_filter_fp_chance = 0.01
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND cdc = false
 AND comment = ''
 AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
 AND compression = {'chunk_length_in_kb': '16', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND memtable = 'default'
 AND crc_check_chance = 1.0
 AND default_time_to_live = 0
 AND extensions = {}
 AND gc_grace_seconds = 864000
 AND incremental_backups = true
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair = 'BLOCKING'
 AND speculative_retry = '99p';

CREATE CUSTOM INDEX ann_index_googleflant5large ON
doc.embeddings_googleflant5large (embeddings) USING 'sai';
CREATE CUSTOM INDEX offset_index_googleflant5large ON
doc.embeddings_googleflant5large (offset) USING 'sai';

nodetool status -r

UN cassandra-1.cassandra5.cassandra5-jos.svc.cluster.local 18.02 GiB
128 100.0% f2989dea-908b-4c06-9caa-4aacad8ba0e8  rack1
UN cassandra-2.cassandra5.cassandra5-jos.svc.cluster.local 17.98 GiB
128 100.0% ec4e506d-5f0d-475a-a3c1-aafe58399412  rack1

Cassandra 5.0 Beta1 - vector searching results

2024-03-21 Thread Joe Obernberger
Hi All - I'd like to share some initial results for the vector search on 
Cassandra 5.0 beta1.  3 node cluster running in kubernetes; fast Netapp 
storage.


Have a table (doc.embeddings_googleflan5tlarge) with definition:

CREATE TABLE doc.embeddings_googleflant5large (
    uuid text,
    type text,
    fieldname text,
    offset int,
    sourceurl text,
    textdata text,
    creationdate timestamp,
    embeddings vector,
    metadata boolean,
    source text,
    PRIMARY KEY ((uuid, type), fieldname, offset, sourceurl, textdata)
) WITH CLUSTERING ORDER BY (fieldname ASC, offset ASC, sourceurl ASC, 
textdata ASC)

    AND additional_write_policy = '99p'
    AND allow_auto_snapshot = true
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND incremental_backups = true
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

CREATE CUSTOM INDEX ann_index_googleflant5large ON 
doc.embeddings_googleflant5large (embeddings) USING 'sai';
CREATE CUSTOM INDEX offset_index_googleflant5large ON 
doc.embeddings_googleflant5large (offset) USING 'sai';


nodetool status -r

UN  cassandra-1.cassandra5.cassandra5-jos.svc.cluster.local 18.02 GiB  
128 100.0% f2989dea-908b-4c06-9caa-4aacad8ba0e8  rack1
UN  cassandra-2.cassandra5.cassandra5-jos.svc.cluster.local  17.98 GiB  
128 100.0% ec4e506d-5f0d-475a-a3c1-aafe58399412  rack1
UN  cassandra-0.cassandra5.cassandra5-jos.svc.cluster.local  18.16 GiB  
128 100.0% 92c6d909-ee01-4124-ae03-3b9e2d5e74c0  rack1


nodetool tablestats doc.embeddings_googleflant5large

Total number of tables: 1

Keyspace: doc
    Read Count: 0
    Read Latency: NaN ms
    Write Count: 2893108
    Write Latency: 326.3586520174843 ms
    Pending Flushes: 0
    Table: embeddings_googleflant5large
    SSTable count: 6
    Old SSTable count: 0
    Max SSTable size: 5.108GiB
    Space used (live): 19318114423
    Space used (total): 19318114423
    Space used by snapshots (total): 0
    Off heap memory used (total): 4874912
    SSTable Compression Ratio: 0.97448
    Number of partitions (estimate): 58399
    Memtable cell count: 0
    Memtable data size: 0
    Memtable off heap memory used: 0
    Memtable switch count: 16
    Speculative retries: 0
    Local read count: 0
    Local read latency: NaN ms
    Local write count: 2893108
    Local write latency: NaN ms
    Local read/write ratio: 0.0
    Pending flushes: 0
    Percent repaired: 100.0
    Bytes repaired: 9.066GiB
    Bytes unrepaired: 0B
    Bytes pending repair: 0B
    Bloom filter false positives: 7245
    Bloom filter false ratio: 0.00286
    Bloom filter space used: 87264
    Bloom filter off heap memory used: 87216
    Index summary off heap memory used: 34624
    Compression metadata off heap memory used: 4753072
    Compacted partition minimum bytes: 2760
    Compacted partition maximum bytes: 4866323
    Compacted partition mean bytes: 154523
    Average live cells per slice (last five minutes): NaN
    Maximum live cells per slice (last five minutes): 0
    Average tombstones per slice (last five minutes): NaN
    Maximum tombstones per slice (last five minutes): 0
    Droppable tombstone ratio: 0.0

nodetool tablehistograms doc.embeddings_googleflant5large

doc/embeddings_googleflant5large histograms
Percentile  Read Latency Write Latency  SSTables    
Partition Size    Cell Count

    (micros) (micros) (bytes)
50% 0.00  0.00 0.00    
105778   124
75% 0.00  0.00 0.00    
182785   215
95% 0.00  0.00 0.00    
379022   446
98% 0.00  0.00 0.00    
545791   642
99% 0.00  0.00 0.00    
654949   

Startup errors - 4.1.3

2023-08-30 Thread Joe Obernberger
Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK.  I 
started to see a lot of timeout errors, and discovered one of the nodes 
had this message constantly repeated:
"waiting to acquire a permit to begin streaming" - so perhaps I hit this 
bug:

https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html

I then restarted that node, but it gave a bunch of errors about 
"unexpected disk state: failed to read translation log"
I deleted the corresponding files and got that node to come up, but now 
when I restart any of the other nodes in the cluster, they too do not 
start back up:


Example:

INFO  [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifying 
logfile transaction 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatched 
line in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got 
'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]' 
expected 
'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]', 
giving up
ERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to read 
records for transaction log 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]
ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 - 
Unexpected disk state: failed to read transaction log 
[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in 
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3, 
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]

Files and contents follow:
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]
    ABORT:[,0,0][737437348]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
    ***Does not match 
 
in first replica file

ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]

ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannot 
remove temporary or obsoleted files for doc.extractedmetadata due to a 
problem with transaction log files. Please check records with problems 
in the log messages above and fix them. Refer to the 3.0 upgrading 
instructions in NEWS.txt for a description of transaction log files.


I then delete the files and eventually after many iterations, the node 
comes back up.
The table 'extractedmetadata' has 29 billion records.  Just a data point 
here - I think the 'right' thing to do is just to go to each node and 
stop it, clean up the files, and finally get each one back up?


-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Big Data Question

2023-08-21 Thread Joe Obernberger
For our scenario, the goal is to minimize down-time for a single (at 
least initially) data center system.  Data-loss is basically 
unacceptable.  I wouldn't say we have a "rusty slow data center" - we 
can certainly use SSDs and have servers connected via 10G copper to a 
fast back-plane.  For our specific use case with Cassandra (lots of 
writes, small number of reads), the network load is usually pretty low.  
I suspect that would change if we used Kubernetes + central persistent 
storage.

Good discussion.

-Joe

On 8/17/2023 7:37 PM, daemeon reiydelle wrote:
I started to respond, then realized I and the other OP posters are not 
thinking the same: What is the business case for availability, data 
los/reload/recoverability? You all argue for higher availability and 
damn the cost. But noone asked "can you lose access, for 20 minutes, 
to a portion of the data, 10 times a year, on a 250 node cluster in 
AWS, if it is not lost"? Can you lose access 1-2 times a year for the 
cost of a 500 node cluster holding the same data?


Then we can discuss 32/64g JVM and SSD's.
/./
/Arthur C. Clarke famously said that "technology sufficiently advanced 
is indistinguishable from magic." Magic is coming, and it's coming for 
all of us/

/
/
*Daemeon Reiydelle*
*email: daeme...@gmail.com*
*LI: https://www.linkedin.com/in/daemeonreiydelle/*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Thu, Aug 17, 2023 at 1:53 PM Joe Obernberger 
 wrote:


Was assuming reaper did incremental?  That was probably a bad
assumption.

nodetool repair -pr
I know it well now!

:)

-Joe

On 8/17/2023 4:47 PM, Bowen Song via user wrote:
> I don't have experience with Cassandra on Kubernetes, so I can't
> comment on that.
>
> For repairs, may I interest you with incremental repairs? It
will make
> repairs hell of a lot faster. Of course, occasional full repair is
> still needed, but that's another story.
>
>
> On 17/08/2023 21:36, Joe Obernberger wrote:
>> Thank you.  Enjoying this conversation.
>> Agree on blade servers, where each blade has a small number of
SSDs.
>> Yeh/Nah to a kubernetes approach assuming fast persistent
storage?  I
>> think that might be easier to manage.
>>
>> In my current benchmarks, the performance is excellent, but the
>> repairs are painful.  I come from the Hadoop world where it was
all
>> about large servers with lots of disk.
>> Relatively small number of tables, but some have a high number of
>> rows, 10bil + - we use spark to run across all the data.
>>
>> -Joe
>>
>> On 8/17/2023 12:13 PM, Bowen Song via user wrote:
>>> The optimal node size largely depends on the table schema and
>>> read/write pattern. In some cases 500 GB per node is too
large, but
>>> in some other cases 10TB per node works totally fine. It's
hard to
>>> estimate that without benchmarking.
>>>
>>> Again, just pointing out the obvious, you did not count the
off-heap
>>> memory and page cache. 1TB of RAM for 24GB heap * 40 instances is
>>> definitely not enough. You'll most likely need between 1.5 and
    2 TB
>>> memory for 40x 24GB heap nodes. You may be better off with blade
>>> servers than single server with gigantic memory and disk sizes.
>>>
>>>
>>> On 17/08/2023 15:46, Joe Obernberger wrote:
>>>> Thanks for this - yeah - duh - forgot about replication in my
example!
>>>> So - is 2TBytes per Cassandra instance advisable?  Better to use
>>>> more/less?  Modern 2u servers can be had with 24 3.8TBtyte
SSDs; so
>>>> assume 80Tbytes per server, you could do:
>>>> (1024*3)/80 = 39 servers, but you'd have to run 40 instances of
>>>> Cassandra on each server; maybe 24G of heap per instance, so a
>>>> server with 1TByte of RAM would work.
>>>> Is this what folks would do?
>>>>
>>>> -Joe
>>>>
>>>> On 8/17/2023 9:13 AM, Bowen Song via user wrote:
>>>>> Just pointing out the obvious, for 1PB of data on nodes with
2TB
>>>>> disk each, you will need far more than 500 nodes.
>>>>>
>>>>> 1, it is unwise to run Cassandra with replication factor 1. It
>>>>> usually makes sense to use RF=3, so 1PB data will cost 3PB of
>>>>> storage space, minimal of 1500 such nodes.
>>>>>
>>>>> 2, depending on the compaction strategy you use 

Re: Big Data Question

2023-08-17 Thread Joe Obernberger

Was assuming reaper did incremental?  That was probably a bad assumption.

nodetool repair -pr
I know it well now!

:)

-Joe

On 8/17/2023 4:47 PM, Bowen Song via user wrote:
I don't have experience with Cassandra on Kubernetes, so I can't 
comment on that.


For repairs, may I interest you with incremental repairs? It will make 
repairs hell of a lot faster. Of course, occasional full repair is 
still needed, but that's another story.



On 17/08/2023 21:36, Joe Obernberger wrote:

Thank you.  Enjoying this conversation.
Agree on blade servers, where each blade has a small number of SSDs.  
Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I 
think that might be easier to manage.


In my current benchmarks, the performance is excellent, but the 
repairs are painful.  I come from the Hadoop world where it was all 
about large servers with lots of disk.
Relatively small number of tables, but some have a high number of 
rows, 10bil + - we use spark to run across all the data.


-Joe

On 8/17/2023 12:13 PM, Bowen Song via user wrote:
The optimal node size largely depends on the table schema and 
read/write pattern. In some cases 500 GB per node is too large, but 
in some other cases 10TB per node works totally fine. It's hard to 
estimate that without benchmarking.


Again, just pointing out the obvious, you did not count the off-heap 
memory and page cache. 1TB of RAM for 24GB heap * 40 instances is 
definitely not enough. You'll most likely need between 1.5 and 2 TB 
memory for 40x 24GB heap nodes. You may be better off with blade 
servers than single server with gigantic memory and disk sizes.



On 17/08/2023 15:46, Joe Obernberger wrote:

Thanks for this - yeah - duh - forgot about replication in my example!
So - is 2TBytes per Cassandra instance advisable?  Better to use 
more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so 
assume 80Tbytes per server, you could do:
(1024*3)/80 = 39 servers, but you'd have to run 40 instances of 
Cassandra on each server; maybe 24G of heap per instance, so a 
server with 1TByte of RAM would work.

Is this what folks would do?

-Joe

On 8/17/2023 9:13 AM, Bowen Song via user wrote:
Just pointing out the obvious, for 1PB of data on nodes with 2TB 
disk each, you will need far more than 500 nodes.


1, it is unwise to run Cassandra with replication factor 1. It 
usually makes sense to use RF=3, so 1PB data will cost 3PB of 
storage space, minimal of 1500 such nodes.


2, depending on the compaction strategy you use and the write 
access pattern, there's a disk space amplification to consider. 
For example, with STCS, the disk usage can be many times of the 
actual live data size.


3, you will need some extra free disk space as temporary space for 
running compactions.


4, the data is rarely going to be perfectly evenly distributed 
among all nodes, and you need to take that into consideration and 
size the nodes based on the node with the most data.


5, enough of bad news, here's a good one. Compression will save 
you (a lot) of disk space!


With all the above considered, you probably will end up with a lot 
more than the 500 nodes you initially thought. Your choice of 
compaction strategy and compression ratio can dramatically affect 
this calculation.



On 16/08/2023 16:33, Joe Obernberger wrote:
General question on how to configure Cassandra.  Say I have 
1PByte of data to store.  The general rule of thumb is that each 
node (or at least instance of Cassandra) shouldn't handle more 
than 2TBytes of disk.  That means 500 instances of Cassandra.


Assuming you have very fast persistent storage (such as a NetApp, 
PorterWorx etc.), would using Kubernetes or some orchestration 
layer to handle those nodes be a viable approach? Perhaps the 
worker nodes would have enough RAM to run 4 instances (pods) of 
Cassandra, you would need 125 servers.
Another approach is to build your servers with 5 (or more) SSD 
devices - one for OS, four for each instance of Cassandra running 
on that server.  Then build some scripts/ansible/puppet that 
would manage Cassandra start/stops, and other maintenance items.


Where I think this runs into problems is with repairs, or 
sstablescrubs that can take days to run on a single instance. How 
is that handled 'in the real world'?  With seed nodes, how many 
would you have in such a configuration?

Thanks for any thoughts!

-Joe








--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Big Data Question

2023-08-17 Thread Joe Obernberger

Thank you.  Enjoying this conversation.
Agree on blade servers, where each blade has a small number of SSDs.  
Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I 
think that might be easier to manage.


In my current benchmarks, the performance is excellent, but the repairs 
are painful.  I come from the Hadoop world where it was all about large 
servers with lots of disk.
Relatively small number of tables, but some have a high number of rows, 
10bil + - we use spark to run across all the data.


-Joe

On 8/17/2023 12:13 PM, Bowen Song via user wrote:
The optimal node size largely depends on the table schema and 
read/write pattern. In some cases 500 GB per node is too large, but in 
some other cases 10TB per node works totally fine. It's hard to 
estimate that without benchmarking.


Again, just pointing out the obvious, you did not count the off-heap 
memory and page cache. 1TB of RAM for 24GB heap * 40 instances is 
definitely not enough. You'll most likely need between 1.5 and 2 TB 
memory for 40x 24GB heap nodes. You may be better off with blade 
servers than single server with gigantic memory and disk sizes.



On 17/08/2023 15:46, Joe Obernberger wrote:

Thanks for this - yeah - duh - forgot about replication in my example!
So - is 2TBytes per Cassandra instance advisable?  Better to use 
more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so 
assume 80Tbytes per server, you could do:
(1024*3)/80 = 39 servers, but you'd have to run 40 instances of 
Cassandra on each server; maybe 24G of heap per instance, so a server 
with 1TByte of RAM would work.

Is this what folks would do?

-Joe

On 8/17/2023 9:13 AM, Bowen Song via user wrote:
Just pointing out the obvious, for 1PB of data on nodes with 2TB 
disk each, you will need far more than 500 nodes.


1, it is unwise to run Cassandra with replication factor 1. It 
usually makes sense to use RF=3, so 1PB data will cost 3PB of 
storage space, minimal of 1500 such nodes.


2, depending on the compaction strategy you use and the write access 
pattern, there's a disk space amplification to consider. For 
example, with STCS, the disk usage can be many times of the actual 
live data size.


3, you will need some extra free disk space as temporary space for 
running compactions.


4, the data is rarely going to be perfectly evenly distributed among 
all nodes, and you need to take that into consideration and size the 
nodes based on the node with the most data.


5, enough of bad news, here's a good one. Compression will save you 
(a lot) of disk space!


With all the above considered, you probably will end up with a lot 
more than the 500 nodes you initially thought. Your choice of 
compaction strategy and compression ratio can dramatically affect 
this calculation.



On 16/08/2023 16:33, Joe Obernberger wrote:
General question on how to configure Cassandra.  Say I have 1PByte 
of data to store.  The general rule of thumb is that each node (or 
at least instance of Cassandra) shouldn't handle more than 2TBytes 
of disk.  That means 500 instances of Cassandra.


Assuming you have very fast persistent storage (such as a NetApp, 
PorterWorx etc.), would using Kubernetes or some orchestration 
layer to handle those nodes be a viable approach? Perhaps the 
worker nodes would have enough RAM to run 4 instances (pods) of 
Cassandra, you would need 125 servers.
Another approach is to build your servers with 5 (or more) SSD 
devices - one for OS, four for each instance of Cassandra running 
on that server.  Then build some scripts/ansible/puppet that would 
manage Cassandra start/stops, and other maintenance items.


Where I think this runs into problems is with repairs, or 
sstablescrubs that can take days to run on a single instance. How 
is that handled 'in the real world'?  With seed nodes, how many 
would you have in such a configuration?

Thanks for any thoughts!

-Joe






--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Big Data Question

2023-08-17 Thread Joe Obernberger

Thanks for this - yeah - duh - forgot about replication in my example!
So - is 2TBytes per Cassandra instance advisable?  Better to use 
more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so 
assume 80Tbytes per server, you could do:
(1024*3)/80 = 39 servers, but you'd have to run 40 instances of 
Cassandra on each server; maybe 24G of heap per instance, so a server 
with 1TByte of RAM would work.

Is this what folks would do?

-Joe

On 8/17/2023 9:13 AM, Bowen Song via user wrote:
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk 
each, you will need far more than 500 nodes.


1, it is unwise to run Cassandra with replication factor 1. It usually 
makes sense to use RF=3, so 1PB data will cost 3PB of storage space, 
minimal of 1500 such nodes.


2, depending on the compaction strategy you use and the write access 
pattern, there's a disk space amplification to consider. For example, 
with STCS, the disk usage can be many times of the actual live data size.


3, you will need some extra free disk space as temporary space for 
running compactions.


4, the data is rarely going to be perfectly evenly distributed among 
all nodes, and you need to take that into consideration and size the 
nodes based on the node with the most data.


5, enough of bad news, here's a good one. Compression will save you (a 
lot) of disk space!


With all the above considered, you probably will end up with a lot 
more than the 500 nodes you initially thought. Your choice of 
compaction strategy and compression ratio can dramatically affect this 
calculation.



On 16/08/2023 16:33, Joe Obernberger wrote:
General question on how to configure Cassandra.  Say I have 1PByte of 
data to store.  The general rule of thumb is that each node (or at 
least instance of Cassandra) shouldn't handle more than 2TBytes of 
disk.  That means 500 instances of Cassandra.


Assuming you have very fast persistent storage (such as a NetApp, 
PorterWorx etc.), would using Kubernetes or some orchestration layer 
to handle those nodes be a viable approach? Perhaps the worker nodes 
would have enough RAM to run 4 instances (pods) of Cassandra, you 
would need 125 servers.
Another approach is to build your servers with 5 (or more) SSD 
devices - one for OS, four for each instance of Cassandra running on 
that server.  Then build some scripts/ansible/puppet that would 
manage Cassandra start/stops, and other maintenance items.


Where I think this runs into problems is with repairs, or 
sstablescrubs that can take days to run on a single instance. How is 
that handled 'in the real world'?  With seed nodes, how many would 
you have in such a configuration?

Thanks for any thoughts!

-Joe




--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Big Data Question

2023-08-16 Thread Joe Obernberger
General question on how to configure Cassandra.  Say I have 1PByte of 
data to store.  The general rule of thumb is that each node (or at least 
instance of Cassandra) shouldn't handle more than 2TBytes of disk.  That 
means 500 instances of Cassandra.


Assuming you have very fast persistent storage (such as a NetApp, 
PorterWorx etc.), would using Kubernetes or some orchestration layer to 
handle those nodes be a viable approach?  Perhaps the worker nodes would 
have enough RAM to run 4 instances (pods) of Cassandra, you would need 
125 servers.
Another approach is to build your servers with 5 (or more) SSD devices - 
one for OS, four for each instance of Cassandra running on that server.  
Then build some scripts/ansible/puppet that would manage Cassandra 
start/stops, and other maintenance items.


Where I think this runs into problems is with repairs, or sstablescrubs 
that can take days to run on a single instance.  How is that handled 'in 
the real world'?  With seed nodes, how many would you have in such a 
configuration?

Thanks for any thoughts!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Repair errors

2023-08-11 Thread Joe Obernberger
)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188)
    at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157)
    at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
    at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:523)
    at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:391)
    at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
    at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
    at 
org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)
    at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)
    at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)
    at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
    at 
org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:275)
    at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:203)
    at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:82)
    at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
    at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:359)
    at 
org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:113)
    at 
org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
    at 
org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db): 
corruption detected, chunk at 604552 of length 7911.
    at 
org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:221)

    ... 46 common frames omitted
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db): 
corruption detected, chunk at 604552 of length 7911.
    at 
org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:209)


Ideas?

-Joe


On 8/7/2023 10:27 PM, manish khandelwal wrote:
What logs of /172.16.20.16:7000 <http://172.16.20.16:7000/> say when 
repair failed. It indicates "validation failed". Can you check 
system.log for /172.16.20.16:7000 <http://172.16.20.16:7000/> and see 
what they say. Looks like you have some issue with *doc/origdoc, 
probably some corrupt sstable. *Try to run repair for individual table 
and see for which table repair fails.


Regards
Manish

On Mon, Aug 7, 2023 at 11:39 PM Joe Obernberger 
 wrote:


Thank you.  I've tried:
nodetool repair --full
nodetool repair -pr
They all get to 57% on any of the nodes, and then fail.
Interestingly the debug log only has INFO - there are no errors.

[2023-08-07 14:02:09,828] Repair command #6 failed with error
Incremental repair session 83dc17d0-354c-11ee-809c-177460b0ed52
has failed
[2023-08-07 14:02:09,830] Repair command #6 finished with error
error: Repair job has failed with the error message: Repair
command #6 failed with error Incremental repair session
83dc17d0-354c-11ee-809c-177460b0ed52 has failed. Check the logs on
the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error
message: Repair command #6 failed with error Incremental repair
session 83dc17d0-354c-11ee-809c-177460b0ed52 has failed. Check the
logs on the repair participants for further details
    at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
    at

org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
    at

java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
    at

java.management/com.sun.jmx.remote.interna

Re: Repair errors

2023-08-07 Thread Joe Obernberger
    at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
    at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)

    at java.base/java.lang.Thread.run(Thread.java:829)

I'm not sure what to do next?

-Joe

On 8/6/2023 8:58 AM, Josh McKenzie wrote:

Quick drive-by observation:

Did not get replies from all endpoints.. Check the
logs on the repair participants for further details


dropping message of type HINT_REQ due to error
org.apache.cassandra.net
<http://org.apache.cassandra.net>.AsyncChannelOutputPlus$FlushException:
The
channel this output stream was writing to has been closed


Caused by: io.netty.channel.unix.Errors$NativeIoException:
writeAddress(..) failed: Connection timed out



java.lang.RuntimeException: Did not get replies from all endpoints.
These all point to the same shaped problem: for whatever reason, the 
coordinator of this repair didn't receive replies from the replicas 
executing it. Could be that they're dead, could be they took too long, 
could be they never got the start message, etc. Distributed operations 
are tricky like that.


Logs on the replicas doing the actual repairs should give you more 
insight; this is a pretty low level generic set of errors that 
basically amounts to "we didn't hear back from the other participants 
in time so we timed out."


On Fri, Aug 4, 2023, at 12:02 PM, Surbhi Gupta wrote:
Can you please try to do nodetool describecluster from every node of 
the cluster?


One time I noticed issue when nodetool status shows all nodes UN but 
describecluster was not.


Thanks
Surbhi

On Fri, Aug 4, 2023 at 8:59 AM Joe Obernberger 
 wrote:


Hi All - been using reaper to do repairs, but it has hung.  I
tried to run:
nodetool repair -pr
on each of the nodes, but they all fail with some form of this error:

error: Repair job has failed with the error message: Repair
command #521
failed with error Did not get replies from all endpoints.. Check the
logs on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error
message: Repair command #521 failed with error Did not get
replies from
all endpoints.. Check the logs on the repair participants for
further
details
 at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
 at

org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
 at

java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
 at

java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
 at

java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
 at

java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
 at java.base/java.lang.Thread.run(Thread.java:829)

Using version 4.1.2-1
nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host
ID   Rack
UN  172.16.100.45   505.66 GiB  250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  380.75 GiB  200 ?
274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
UN  172.16.100.35   479.2 GiB   200 ?
59150c47-274a-46fb-9d5e-bed468d36797  rack1
UN  172.16.100.252  248.69 GiB  200 ?
8f0d392f-0750-44e2-91a5-b30708ade8e4  rack1
UN  172.16.100.249  411.53 GiB  200 ?
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.38   333.26 GiB  200 ?
0d9509cc-2f23-4117-a883-469a1be54baf  rack1
UN  172.16.100.36   405.33 GiB  200 ?
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   437.74 GiB  200 ?
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.248  344.4 GiB   200 ?
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.44   409.36 GiB  200 ?
b2e5366e-8386-40ec-a641-27944a5a7c

Repair errors

2023-08-04 Thread Joe Obernberger

Hi All - been using reaper to do repairs, but it has hung.  I tried to run:
nodetool repair -pr
on each of the nodes, but they all fail with some form of this error:

error: Repair job has failed with the error message: Repair command #521 
failed with error Did not get replies from all endpoints.. Check the 
logs on the repair participants for further details

-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error 
message: Repair command #521 failed with error Did not get replies from 
all endpoints.. Check the logs on the repair participants for further 
details
    at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
    at 
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
    at 
java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)

    at java.base/java.lang.Thread.run(Thread.java:829)

Using version 4.1.2-1
nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host 
ID   Rack
UN  172.16.100.45   505.66 GiB  250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  380.75 GiB  200 ? 
274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
UN  172.16.100.35   479.2 GiB   200 ? 
59150c47-274a-46fb-9d5e-bed468d36797  rack1
UN  172.16.100.252  248.69 GiB  200 ? 
8f0d392f-0750-44e2-91a5-b30708ade8e4  rack1
UN  172.16.100.249  411.53 GiB  200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.38   333.26 GiB  200 ? 
0d9509cc-2f23-4117-a883-469a1be54baf  rack1
UN  172.16.100.36   405.33 GiB  200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   437.74 GiB  200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.248  344.4 GiB   200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.44   409.36 GiB  200 ? 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  172.16.100.37   236.08 GiB  120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.20.16    975 GiB 500 ? 
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
UN  172.16.100.34   340.77 GiB  200 ? 
352fd049-32f8-4be8-9275-68b145ac2832  rack1
UN  172.16.100.42   974.86 GiB  500 ? 
b088a8e6-42f3-4331-a583-47ef5149598f  rack1


Note: Non-system keyspaces don't have the same replication settings, 
effective ownership information is meaningless


Debug log has:


DEBUG [ScheduledTasks:1] 2023-08-04 11:56:04,955 
MigrationCoordinator.java:264 - Pulling unreceived schema versions...
INFO  [HintsDispatcher:11344] 2023-08-04 11:56:21,369 
HintsDispatchExecutor.java:318 - Finished hinted handoff of file 
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297-1690426370160-2.hints to endpoint 
/172.16.20.16:7000: 1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297, partially
WARN 
[Messaging-OUT-/172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES] 
2023-08-04 11:56:21,916 OutboundConnection.java:491 - 
/172.16.100.34:7000->/172.16.20.16:7000-LARGE_MESSAGES-[no-channel] 
dropping message of type HINT_REQ due to error
org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The 
channel this output stream was writing to has been closed
    at 
org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
    at 
org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
    at 
org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
    at 
org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
    at 
org.apache.cassandra.net.AsyncMessageOutputPlus.doFlush(AsyncMessageOutputPlus.java:100)
    at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:122)
    at 
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:139)
    at 
org.apache.cassandra.hints.HintMessage$Serializer.serialize(HintMessage.java:77)
    at 
org.apache.cassandra.net.Message$Serializer.serializePost40(Message.java:844)
    at 
org.apache.cassandra.net.Message$Serializer.serialize(Message.java:702)
    at 
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:984)
    at 
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:690)
    at 

Pulling unreceived schema versions

2023-02-13 Thread Joe Obernberger

Hi all - I'm seeing this message:
"Pulling unreceived schema versions..."

in the debug log being repeated exactly every minute, but I can't find 
what this means?

Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Startup fails - 4.1.0

2023-02-03 Thread Joe Obernberger

Thank you Sean.  I had to remove two of the files and then it started.
Cheers!

-Joe

On 2/3/2023 3:52 PM, Durity, Sean R via user wrote:


In most cases, I would delete the corrupt commit log file and restart. 
Then run repairs on that node. I have seen cases where multiple files 
are corrupted and it is easier to remove all commit log files to get 
the node restarted.


Sean R. Durity

*From:*Joe Obernberger 
*Sent:* Friday, February 3, 2023 3:15 PM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] Startup fails - 4.1.0

Hi all - cluster had a power outage and one of the nodes in a 14 nodes 
cluster isn't starting with: DEBUG [MemtableFlushWriter: 1] 2023-02-03 
13: 52: 45,468 ColumnFamilyStore. java: 1329 - Flushed to 
[BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8407-big-Data. db'),


INTERNAL USE

Hi all - cluster had a power outage and one of the nodes in a 14 nodes
cluster isn't starting with:
DEBUG [MemtableFlushWriter:1] 2023-02-03 13:52:45,468
ColumnFamilyStore.java:1329 - Flushed to
[BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8407-big-Data.db'), 

BigTableReader(path='/data/3/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8408-big-Data.db'), 

BigTableReader(path='/data/4/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8409-big-Data.db'), 

BigTableReader(path='/data/5/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8410-big-Data.db'), 

BigTableReader(path='/data/6/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8411-big-Data.db'), 

BigTableReader(path='/data/8/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8412-big-Data.db'), 

BigTableReader(path='/data/9/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8413-big-Data.db')] 


(7 sstables, 92.858MiB), biggest 15.420MiB, smallest 10.307MiB
INFO  [main] 2023-02-03 13:52:45,621 CommitLogReader.java:257 - Finished
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126163.log
DEBUG [main] 2023-02-03 13:52:45,622 CommitLogReader.java:266 - Reading
/var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log (CL version
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:46,811 CommitLogReader.java:257 - Finished
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log
DEBUG [main] 2023-02-03 13:52:46,811 CommitLogReader.java:266 - Reading
/var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log (CL version
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:47,985 CommitLogReader.java:257 - Finished
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log
DEBUG [main] 2023-02-03 13:52:47,986 CommitLogReader.java:266 - Reading
/var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log (CL version
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:49,282 CommitLogReader.java:257 - Finished
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log
DEBUG [main] 2023-02-03 13:52:49,283 CommitLogReader.java:266 - Reading
/var/lib/cassandra/commitlog/CommitLog-7-1674161126167.log (CL version
7, messaging version 12, compression null)
ERROR [main] 2023-02-03 13:52:49,651 JVMStabilityInspector.java:196 -
Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: 


Mutation checksum failure at 11231154 in Next section at 11230925 in
CommitLog-7-1674161126167.log
    at
org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:387)
    at
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:244)
    at
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
    at
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:191)
    at
org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:200)
    at
org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:181)
    at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:357)
    at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:752)
    at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:876)
How to proceed?
Thank you!
-Joe
--
This email has been checked for viruses by AVG antivirus software.
https://urldefense.com/v3/__http://www.avg.com__;!!M-nmYVHPHQ!Js4daAIOfZ53leb5CNpzGIRZEtXKw9d4dmRxBCGsYe-51J4i2NkrYuazzFt8O5U-KdQ3HCo9xu4_AeqVYBNySKpz31KzFe0cOQ$ 
<https://urldefense.com/v3/__http:/www.avg.com__;!!M-nmYVHPHQ!Js4daAIOfZ53leb5CNpzGIRZEtXKw9d4dmRxBCGsYe-51J4i2NkrYuazzFt8O5U-KdQ3HCo9xu4_AeqVYBNySKpz31KzFe0cOQ$>

Startup fails - 4.1.0

2023-02-03 Thread Joe Obernberger
Hi all - cluster had a power outage and one of the nodes in a 14 nodes 
cluster isn't starting with:


DEBUG [MemtableFlushWriter:1] 2023-02-03 13:52:45,468 
ColumnFamilyStore.java:1329 - Flushed to 
[BigTableReader(path='/data/2/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8407-big-Data.db'), 
BigTableReader(path='/data/3/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8408-big-Data.db'), 
BigTableReader(path='/data/4/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8409-big-Data.db'), 
BigTableReader(path='/data/5/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8410-big-Data.db'), 
BigTableReader(path='/data/6/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8411-big-Data.db'), 
BigTableReader(path='/data/8/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8412-big-Data.db'), 
BigTableReader(path='/data/9/cassandra/data/doc/correlation_counts-ca4e8c0080b311edbd6d4d9b3bfd78b2/nb-8413-big-Data.db')] 
(7 sstables, 92.858MiB), biggest 15.420MiB, smallest 10.307MiB
INFO  [main] 2023-02-03 13:52:45,621 CommitLogReader.java:257 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126163.log
DEBUG [main] 2023-02-03 13:52:45,622 CommitLogReader.java:266 - Reading 
/var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log (CL version 
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:46,811 CommitLogReader.java:257 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126164.log
DEBUG [main] 2023-02-03 13:52:46,811 CommitLogReader.java:266 - Reading 
/var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log (CL version 
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:47,985 CommitLogReader.java:257 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126165.log
DEBUG [main] 2023-02-03 13:52:47,986 CommitLogReader.java:266 - Reading 
/var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log (CL version 
7, messaging version 12, compression null)
INFO  [main] 2023-02-03 13:52:49,282 CommitLogReader.java:257 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1674161126166.log
DEBUG [main] 2023-02-03 13:52:49,283 CommitLogReader.java:266 - Reading 
/var/lib/cassandra/commitlog/CommitLog-7-1674161126167.log (CL version 
7, messaging version 12, compression null)
ERROR [main] 2023-02-03 13:52:49,651 JVMStabilityInspector.java:196 - 
Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: 
Mutation checksum failure at 11231154 in Next section at 11230925 in 
CommitLog-7-1674161126167.log
    at 
org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:387)
    at 
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:244)
    at 
org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
    at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:191)
    at 
org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:200)
    at 
org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:181)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:357)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:752)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:876)


How to proceed?
Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger

Thank you - I was just impatient.  :)

-Joe

On 1/23/2023 12:56 PM, Jeff Jirsa wrote:

Those hosts are likely sending streams.

If you do `nodetool netstats` on the replicas of the node you're 
removing, you should see byte counters and file counters - they should 
all be incrementing. If one of them isnt incremening, that one is 
probably stuck.


There's at least one bug in 4.1 that can cause (I think? rate 
limiters) to interact in a way that can cause this. 
https://issues.apache.org/jira/browse/CASSANDRA-18110 describes it and 
has a workaround.




On Mon, Jan 23, 2023 at 9:41 AM Joe Obernberger 
 wrote:


I had a drive fail (first drive in the list) on a Cassandra cluster.
I've stopped the node (as it no longer starts), and am trying to
remove
it from the cluster, but the removenode command is hung (been running
for 3 hours so far):
nodetool removenode status is always reporting the same token as
being
removed.  Help?

nodetool removenode status
RemovalStatus: Removing token (-9196617215347134065). Waiting for
replication confirmation from
[/172.16.100.248 <http://172.16.100.248>,/172.16.100.249
<http://172.16.100.249>,/172.16.100.251
<http://172.16.100.251>,/172.16.100.252
<http://172.16.100.252>,/172.16.100.34
<http://172.16.100.34>,/172.16.100.35
<http://172.16.100.35>,/172.16.100.36
<http://172.16.100.36>,/172.16.100.37
<http://172.16.100.37>,/172.16.100.38
<http://172.16.100.38>,/172.16.100.42
<http://172.16.100.42>,/172.16.100.44
<http://172.16.100.44>,/172.16.100.45 <http://172.16.100.45>].

Thanks.

-Joe


-- 
This email has been checked for viruses by AVG antivirus software.

www.avg.com <http://www.avg.com>


removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
I had a drive fail (first drive in the list) on a Cassandra cluster.  
I've stopped the node (as it no longer starts), and am trying to remove 
it from the cluster, but the removenode command is hung (been running 
for 3 hours so far):
nodetool removenode status is always reporting the same token as being 
removed.  Help?


nodetool removenode status
RemovalStatus: Removing token (-9196617215347134065). Waiting for 
replication confirmation from 
[/172.16.100.248,/172.16.100.249,/172.16.100.251,/172.16.100.252,/172.16.100.34,/172.16.100.35,/172.16.100.36,/172.16.100.37,/172.16.100.38,/172.16.100.42,/172.16.100.44,/172.16.100.45].


Thanks.

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Failed disks - correct procedure

2023-01-23 Thread Joe Obernberger
Some more observations.  If the first drive fails on a node, then you 
can't just remove it from the list.  Example:

We have:
/data/1/cassandra
/data/2/cassandra
/data/3/cassandra
/data/4/cassandra
...

If /data/1 fails, and I remove it from the list, when you try to start 
cassandra on that node it says there already exists a node with that 
address and you need to replace it.  I think the only option at that 
point it to bootstrap it and use the replace_address option.


-Joe

On 1/17/2023 10:41 AM, C. Scott Andreas wrote:
Bumping this note from Andy downthread to make sure everyone has seen 
it and is aware:


“Before you do that, you will want to make sure a cycle of repairs has 
run on the replicas of the down node to ensure they are consistent 
with each other.”


When replacing an instance, it’s necessary to run repair (incremental 
or full) among the surviving replicas *before* bootstrapping a 
replacement instance in. If you don’t do this, Cassandra’s quorum 
consistency guarantees won’t be met and data may appear to be lost. 
It’s not possible to use Cassandra as a consistent database without 
doing so.


Given replicas A, B, C, and replacement replica A*:
- Quorum write is witnessed by A, B
- A fails
- A* is bootstrapped in without repair of B, C
- Quorum read succeeds against A*, C
- The successful quorum read will not observe data from the previous 
successful quorum write and the data will appear to be lost.


Repairing surviving replicas before bootstrapping a replacement node 
is necessary to avoid this.


— Scott

On Jan 17, 2023, at 7:28 AM, Joe Obernberger 
 wrote:




I come from the hadoop world where we have a cluster with probably 
over 500 drives.  Drives fail all the time; or well several a year 
anyway.  We remove that single drive from HDFS, HDFS re-balances, and 
when we get around to it, we swap in a new drive, format it, and add 
it back to HDFS.  We keep the OS drives separate from the data drives 
and ensure that the OS volume is in a RAID mirror.  It's painful when 
OS drives fail, so mirror works.  When space is low, we add another 
node with lots of disks.
We are repurposing this same hardware to run a large Cassandra 
cluster.  I'd love it if Cassandra could support larger individual 
nodes, but we've been trying to configure it with lots of disks for 
redundancy, with the idea that we won't use an entire nodes storage 
only for Cassandra.  As was mentioned a long while back, blades seem 
to make more sense for Cassandra than single nodes with lots of disk, 
but we've got what we've got!

:)

So far, no issues with:
Stop node, remove drive from cassandra config, start node, run repair 
- version 4.1.


-Joe

On 1/17/2023 10:11 AM, Durity, Sean R via user wrote:


For physical hardware when disks fail, I do a removenode, wait for 
the drive to be replaced, reinstall Cassandra, and then bootstrap 
the node back in (and run clean-up across the DC).


All of our disks are presented as one file system for data, which is 
not what the original question was asking.


Sean R. Durity

*From:*Marc Hoppins 
*Sent:* Tuesday, January 17, 2023 3:57 AM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] RE: Failed disks - correct procedure

HI all, I was pondering this very situation. We have a node with a 
crapped-out disk (not the first time). Removenode vs repairnode: in 
regard time, there is going to be little difference twixt replacing 
a dead node and removing then re-installing


INTERNAL USE

HI all,
I was pondering this very situation.
We have a node with a crapped-out disk (not the first time). 
Removenode vs repairnode: in regard time, there is going to be 
little difference twixt replacing a dead node and removing then 
re-installing a node.  There is going to be a bunch of reads/writes 
and verifications (or similar) which is going to take a similar 
amount of time...or do I read that wrong?
For myself, I just go with removenode and then rejoin after HDD has 
bee replaced.  Usually the fix exceeds the wait time and the node is 
then out of the system anyway.

-Original Message-
From: Joe Obernberger 
Sent: Monday, January 16, 2023 6:31 PM
To: Jeff Jirsa ; user@cassandra.apache.org
Subject: Re: Failed disks - correct procedure
EXTERNAL
I'm using 4.1.0-1.
I've been doing a lot of truncates lately before the drive failed 
(research project).  Current drives have about 100GBytes of data 
each, although the actual amount of data in Cassandra is much less 
(because of truncates and snapshots).  The cluster is not 
homo-genius; some nodes have more drives than others.

nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns  Host
ID   Rack
UN  nyx.querymasters.com    7.9 GiB    250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  6.34 GiB   200 ?
274a6e8d-de37-4e0b-b000

Re: Failed disks - correct procedure

2023-01-17 Thread Joe Obernberger
I come from the hadoop world where we have a cluster with probably over 
500 drives.  Drives fail all the time; or well several a year anyway.  
We remove that single drive from HDFS, HDFS re-balances, and when we get 
around to it, we swap in a new drive, format it, and add it back to 
HDFS.  We keep the OS drives separate from the data drives and ensure 
that the OS volume is in a RAID mirror.  It's painful when OS drives 
fail, so mirror works.  When space is low, we add another node with lots 
of disks.
We are repurposing this same hardware to run a large Cassandra cluster.  
I'd love it if Cassandra could support larger individual nodes, but 
we've been trying to configure it with lots of disks for redundancy, 
with the idea that we won't use an entire nodes storage only for 
Cassandra.  As was mentioned a long while back, blades seem to make more 
sense for Cassandra than single nodes with lots of disk, but we've got 
what we've got!

:)

So far, no issues with:
Stop node, remove drive from cassandra config, start node, run repair - 
version 4.1.


-Joe

On 1/17/2023 10:11 AM, Durity, Sean R via user wrote:


For physical hardware when disks fail, I do a removenode, wait for the 
drive to be replaced, reinstall Cassandra, and then bootstrap the node 
back in (and run clean-up across the DC).


All of our disks are presented as one file system for data, which is 
not what the original question was asking.


Sean R. Durity

*From:*Marc Hoppins 
*Sent:* Tuesday, January 17, 2023 3:57 AM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] RE: Failed disks - correct procedure

HI all, I was pondering this very situation. We have a node with a 
crapped-out disk (not the first time). Removenode vs repairnode: in 
regard time, there is going to be little difference twixt replacing a 
dead node and removing then re-installing


INTERNAL USE

HI all,
I was pondering this very situation.
We have a node with a crapped-out disk (not the first time). 
Removenode vs repairnode: in regard time, there is going to be little 
difference twixt replacing a dead node and removing then re-installing 
a node.  There is going to be a bunch of reads/writes and 
verifications (or similar) which is going to take a similar amount of 
time...or do I read that wrong?
For myself, I just go with removenode and then rejoin after HDD has 
bee replaced.  Usually the fix exceeds the wait time and the node is 
then out of the system anyway.

-Original Message-
From: Joe Obernberger 
Sent: Monday, January 16, 2023 6:31 PM
To: Jeff Jirsa ; user@cassandra.apache.org
Subject: Re: Failed disks - correct procedure
EXTERNAL
I'm using 4.1.0-1.
I've been doing a lot of truncates lately before the drive failed 
(research project).  Current drives have about 100GBytes of data each, 
although the actual amount of data in Cassandra is much less (because 
of truncates and snapshots).  The cluster is not homo-genius; some 
nodes have more drives than others.

nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns  Host
ID   Rack
UN  nyx.querymasters.com    7.9 GiB    250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  6.34 GiB   200 ?
274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
UN  aion.querymasters.com   6.31 GiB   200 ?
59150c47-274a-46fb-9d5e-bed468d36797  rack1
UN  calypso.querymasters.com    6.26 GiB   200 ?
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    7.1 GiB    200 ?
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  kratos.querymasters.com 6.36 GiB   200 ?
0d9509cc-2f23-4117-a883-469a1be54baf  rack1
UN  charon.querymasters.com 6.35 GiB   200 ?
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   6.4 GiB    200 ?
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  ursula.querymasters.com 6.24 GiB   200 ?
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   6.28 GiB   200 ?
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  3.78 GiB   120 ?
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 6.24 GiB   200 ?
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  paradigm7.querymasters.com  16.25 GiB  500 ?
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
UN  aether.querymasters.com 6.36 GiB   200 ?
352fd049-32f8-4be8-9275-68b145ac2832  rack1
UN  athena.querymasters.com 15.85 GiB  500 ?
b088a8e6-42f3-4331-a583-47ef5149598f  rack1
-Joe
On 1/16/2023 12:23 PM, Jeff Jirsa wrote:
> Prior to cassandra-6696 you’d have to treat one missing disk as a 
> failed machine, wipe all the data and re-stream it, as a tombstone for 
> a given value may be on one disk and data on another (effectively 
> redirecting data)

>
> So the answer has to be version dependent, t

Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

I'm using 4.1.0-1.
I've been doing a lot of truncates lately before the drive failed 
(research project).  Current drives have about 100GBytes of data each, 
although the actual amount of data in Cassandra is much less (because of 
truncates and snapshots).  The cluster is not homo-genius; some nodes 
have more drives than others.


nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns  Host 
ID   Rack
UN  nyx.querymasters.com    7.9 GiB    250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  6.34 GiB   200 ? 
274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
UN  aion.querymasters.com   6.31 GiB   200 ? 
59150c47-274a-46fb-9d5e-bed468d36797  rack1
UN  calypso.querymasters.com    6.26 GiB   200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    7.1 GiB    200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  kratos.querymasters.com 6.36 GiB   200 ? 
0d9509cc-2f23-4117-a883-469a1be54baf  rack1
UN  charon.querymasters.com 6.35 GiB   200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   6.4 GiB    200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  ursula.querymasters.com 6.24 GiB   200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   6.28 GiB   200 ? 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  3.78 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 6.24 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  paradigm7.querymasters.com  16.25 GiB  500 ? 
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
UN  aether.querymasters.com 6.36 GiB   200 ? 
352fd049-32f8-4be8-9275-68b145ac2832  rack1
UN  athena.querymasters.com 15.85 GiB  500 ? 
b088a8e6-42f3-4331-a583-47ef5149598f  rack1


-Joe

On 1/16/2023 12:23 PM, Jeff Jirsa wrote:

Prior to cassandra-6696 you’d have to treat one missing disk as a failed 
machine, wipe all the data and re-stream it, as a tombstone for a given value 
may be on one disk and data on another (effectively redirecting data)

So the answer has to be version dependent, too - which version were you using?


On Jan 16, 2023, at 9:08 AM, Tolbert, Andy  wrote:

Hi Joe,

Reading it back I realized I misunderstood that part of your email, so
you must be using data_file_directories with 16 drives?  That's a lot
of drives!  I imagine this may happen from time to time given that
disks like to fail.

That's a bit of an interesting scenario that I would have to think
about.  If you brought the node up without the bad drive, repairs are
probably going to do a ton of repair overstreaming if you aren't using
4.0 (https://issues.apache.org/jira/browse/CASSANDRA-3200) which may
put things into a really bad state (lots of streaming = lots of
compactions = slower reads) and you may be seeing some inconsistency
if repairs weren't regularly running beforehand.

How much data was on the drive that failed?  How much data do you
usually have per node?

Thanks,
Andy


On Mon, Jan 16, 2023 at 10:59 AM Joe Obernberger
 wrote:

Thank you Andy.
Is there a way to just remove the drive from the cluster and replace it
later?  Ordering replacement drives isn't a fast process...
What I've done so far is:
Stop node
Remove drive reference from /etc/cassandra/conf/cassandra.yaml
Restart node
Run repair

Will that work?  Right now, it's showing all nodes as up.

-Joe


On 1/16/2023 11:55 AM, Tolbert, Andy wrote:
Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
 wrote:

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra
data is split across them.  One drive is failing.  Can I just remove it
from the list and cassandra will then replicate? If not - what?
Thank you!

-Joe

Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

Thank you Andy.
Is there a way to just remove the drive from the cluster and replace it 
later?  Ordering replacement drives isn't a fast process...

What I've done so far is:
Stop node
Remove drive reference from /etc/cassandra/conf/cassandra.yaml
Restart node
Run repair

Will that work?  Right now, it's showing all nodes as up.

-Joe

On 1/16/2023 11:55 AM, Tolbert, Andy wrote:

Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
 wrote:

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra
data is split across them.  One drive is failing.  Can I just remove it
from the list and cassandra will then replicate? If not - what?
Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra 
data is split across them.  One drive is failing.  Can I just remove it 
from the list and cassandra will then replicate? If not - what?

Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Adding nodes

2022-07-11 Thread Joe Obernberger
I too came from HBase and discovered adding several nodes at a time 
doesn't work.  Are you absolutely sure that the clocks are in sync 
across the nodes?  This has bitten me several times.


-Joe

On 7/11/2022 6:23 AM, Bowen Song via user wrote:


You should look for warning and error level logs in the system.log, 
not the debug.log or gc.log, and certainly not only the latest lines.


BTW, you may want to spend some time investigating potential GC issues 
based on the GC logs you provided. I can see 1 full GC in the 3 hours 
since the node started. It's not necessarily a problem (if it only 
occasionally happens during the initial bootstraping process), but it 
should justify an investigation if this is the first time you've seen it.


On 11/07/2022 11:09, Marc Hoppins wrote:


Service still running. No errors showing.

The latest info is in debug.log

DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902 
NettyStreamingMessageSender.java:258 - [Stream 
#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sending 
keep-alive


DEBUG [Stream-Deserializer-/10.1.146.174:7000-053f2911] 2022-07-11 
12:00:39,790 StreamingInboundHandler.java:179 - [Stream 
#befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Received 
keep-alive


DEBUG [ScheduledTasks:1] 2022-07-11 12:00:44,688 
StorageService.java:2398 - Ignoring application state LOAD from 
/x.x.x.64:7000 because it is not a member in token metadata


DEBUG [ScheduledTasks:1] 2022-07-11 12:01:44,689 
StorageService.java:2398 - Ignoring application state LOAD from 
/x.x.x.64:7000 because it is not a member in token metadata


DEBUG [ScheduledTasks:1] 2022-07-11 12:02:44,690 
StorageService.java:2398 - Ignoring application state LOAD from 
/x.x.x.64:7000 because it is not a member in token metadata


And

gc.log.1.current

2022-07-11T12:08:40.562+0200: 11122.837: [GC (Allocation Failure) 
2022-07-11T12:08:40.562+0200: 11122.838: [ParNew


Desired survivor size 41943040 bytes, new threshold 1 (max 1)

- age   1:  57264 bytes,  57264 total

: 655440K->74K(737280K), 0.0289143 secs] 
2575800K->1920436K(8128512K), 0.0291355 secs] [Times: user=0.23 
sys=0.00, real=0.03 secs]


Heap after GC invocations=6532 (full 1):

par new generation   total 737280K, used 74K [0x0005cae0, 
0x0005fce0, 0x0005fce0)


eden space 655360K,   0% used [0x0005cae0, 
0x0005cae0, 0x0005f2e0)


from space 81920K,   0% used [0x0005f2e0, 0x0005f2e12848, 
0x0005f7e0)


to   space 81920K,   0% used [0x0005f7e0, 0x0005f7e0, 
0x0005fce0)


concurrent mark-sweep generation total 7391232K, used 1920362K 
[0x0005fce0, 0x0007c000, 0x0007c000)


Metaspace used 53255K, capacity 56387K, committed 56416K, reserved 
1097728K


class space    used 6926K, capacity 7550K, committed 7576K, reserved 
1048576K


}

2022-07-11T12:08:40.591+0200: 11122.867: Total time for which 
application threads were stopped: 0.0309913 seconds, Stopping threads 
took: 0.0012599 seconds


{Heap before GC invocations=6532 (full 1):

par new generation   total 737280K, used 655434K [0x0005cae0, 
0x0005fce0, 0x0005fce0)


eden space 655360K, 100% used [0x0005cae0, 
0x0005f2e0, 0x0005f2e0)


from space 81920K,   0% used [0x0005f2e0, 0x0005f2e12848, 
0x0005f7e0)


to   space 81920K,   0% used [0x0005f7e0, 0x0005f7e0, 
0x0005fce0)


concurrent mark-sweep generation total 7391232K, used 1920362K 
[0x0005fce0, 0x0007c000, 0x0007c000)


Metaspace   used 53255K, capacity 56387K, committed 56416K, 
reserved 1097728K


class space    used 6926K, capacity 7550K, committed 7576K, reserved 
1048576K


2022-07-11T12:08:42.163+0200: 11124.438: [GC (Allocation Failure) 
2022-07-11T12:08:42.163+0200: 11124.438: [ParNew


Desired survivor size 41943040 bytes, new threshold 1 (max 1)

- age   1:  54984 bytes,  54984 total

: 655434K->80K(737280K), 0.0291754 secs] 
2575796K->1920445K(8128512K), 0.0293884 secs] [Times: user=0.22 
sys=0.00, real=0.03 secs]


*From:*Bowen Song via user 
*Sent:* Monday, July 11, 2022 11:56 AM
*To:* user@cassandra.apache.org
*Subject:* Re: Adding nodes

EXTERNAL

Checking on multiple nodes won't help if the joining node suffers 
from any of the issues I described, as it will likely be flipping up 
and down frequently, and the existing nodes in the cluster may never 
reach an agreement before the joining node stays up (or stays down) 
for a while. However, it will be a very strange thing if this is a 
persistent behaviour. If the 'nodetool status' output on each node 
remained unchanged for hours and the outputs aren't the same between 
nodes, it could be an indicator of something else that had gone wrong.


Does the strange behaviour goes away after the joining node completes 
the streaming and fully joins the cluster?


On 11/07/2022 10:46, Marc Hoppins wrote:

 

Re: removing a drive - 4.0.1

2022-06-09 Thread Joe Obernberger
When a drive fails in a large cluster and you don't immediately have a 
replacement drive, is it OK to just remove the drive from cassandra.yaml 
and restart the node?  Will the missing data (assuming RF=3) be 
re-replicated?
I have disk_failure_policy set to "best_effort", but the node still 
fails (ie cassandra exits) when a disk (spinning rust) goes bad.

I do have commit_failure_policy set to stop.

Thank you!

-Joe

On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:

There is a jira ticket describing your situation
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793

I may be wrong but is seems that system directories are pinned to 
first data directory in cassandra.yaml by default. When you removed 
first item from the list system data regenerated in the new first 
directory in the list. And then merged??? when original first dir returned


On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger 
 wrote:


Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh <http://cassandra.env.sh> file, and
started it up.  It is currently bootstrapping.

In cassandra.yaml, say you have the following:

data_file_directories:
    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

If I change the above to:
#    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

the problem happens.  If I change it to:

    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
#    - /data/8/cassandra

the node starts up OK.  I assume it will recover the missing data
during a repair?

-Joe

On 1/7/2022 4:13 PM, Mano ksio wrote:

Hi, you may have already tried, but this may help.

https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists


can you be little narrate 'If I remove a drive other than the
first one'? what does it means

On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
 wrote:

Hi All - I have a 13 node cluster running Cassandra 4.0.1. 
If I stop a
node, edit the cassandra.yaml file, comment out the first
drive in the
list, and restart the node, it fails to start saying that a
node already
exists in the cluster with the IP address.

If I put the drive back into the list, the node still fails
to start
with the same error.  At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 <http://172.16.100.39:7000>
already exists, cancelling join. Use
cassandra.replace_address if you
want to replace this node.
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at

org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at

org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

---

If I remove a drive other than the first one, this problem
doesn't
occur.  Any other options?  It appears that if it the first
drive in the
list goes bad, or is just removed, that entire node must be
replaced.

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
Virus-free. www.avg.com

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>


<#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>




--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: Malformed IPV6 address

2022-04-27 Thread Joe Obernberger

Thank you.
The -Dcom.sun.jndi.rmiURLParsing=legacy works for me.

-Joe

On 4/27/2022 4:28 AM, Erick Ramirez wrote:
This issue was reported in 
https://community.datastax.com/questions/13764/ as well. TL;DR the URL 
parser for JNDI providers was made stricter in Oracle Java 8u331 and 
brackets are only allowed around IPv6 addresses. The URL format in 
NodeProbe.java wraps the host in square brackets so nodetool fails 
with the syntax exception.


Jermy Li posted PR #1586 and I've requested him to log a ticket for it 
(CASSANDRA-17581). Israel Fuchter and penky28 posted the following 
workarounds:


OPTION 1 - Add a legacy flag to disable the new validation, for example:

$ nodetool -Dcom.sun.jndi.rmiURLParsing=legacy status

OPTION 2 - Specify the hostname with an IPv6 subnet prefix, for example:

$ nodetool -h :::127.0.0.1 status

Would you please try both workarounds and let us know if either of 
them work for you? Cheers!


--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Malformed IPV6 address

2022-04-26 Thread Joe Obernberger
It was upgraded from on older version of openJDK 11; not sure which 
one.  Not that old though; we keep the machines pretty well updated.  
Fix didn't work:

export JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true"
[ieproc@pallas ieDocs]$ nodetool status -r
nodetool: Failed to connect to '127.0.0.1:7199' - URISyntaxException: 
'Malformed IPv6 address at index 7: rmi://[127.0.0.1]:7199'.


-Joe

On 4/26/2022 4:53 PM, Jeff Jirsa wrote:

Oof. From which version did you upgrade?

I would try:
>  export _JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true"

There's a chance that fixes it (for an unpleasant reason).

Did you get a specific stack trace / log message at all? or just that 
error?






On Tue, Apr 26, 2022 at 1:47 PM Joe Obernberger 
 wrote:


Hi All - upgraded java recently
(java-11-openjdk-11.0.15.0.9-2.el7_9.x86_64) , and now getting:

nodetool: Failed to connect to '127.0.0.1:7199
<http://127.0.0.1:7199>' - URISyntaxException:
'Malformed IPv6 address at index 7: rmi://[127.0.0.1]:7199'.

whenever running nodetool.
What am I missing?

Thanks!

-Joe


-- 
This email has been checked for viruses by AVG.

https://www.avg.com


Malformed IPV6 address

2022-04-26 Thread Joe Obernberger
Hi All - upgraded java recently 
(java-11-openjdk-11.0.15.0.9-2.el7_9.x86_64) , and now getting:


nodetool: Failed to connect to '127.0.0.1:7199' - URISyntaxException: 
'Malformed IPv6 address at index 7: rmi://[127.0.0.1]:7199'.


whenever running nodetool.
What am I missing?

Thanks!

-Joe


--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: about the performance of select * from tbl

2022-04-26 Thread Joe Obernberger

This would be a good use case for Spark + Cassandra.

-Joe

On 4/26/2022 8:48 AM, 18624049226 wrote:


We have a business scenario. We must execute the following statement:

select * from tbl;

This CQL has no WHERE condition.

What I want to ask is that if the data in this table is more than one 
million or more, what methods or parameters can improve the 
performance of this CQL?




--
This email has been checked for viruses by AVG.
https://www.avg.com


Re: Cassandra Management tools?

2022-03-01 Thread Joe Obernberger
Thanks all - I'll take a look at Ansible.  Back in my Hadoop days, we 
would use Cloudera manager (course that now costs $). Sounds like we 
need a new open source project!  :)


-Joe

On 3/1/2022 7:46 AM, Bowen Song wrote:
We use Ansible to manage a fairly large (200+ nodes) cluster. We 
created our own Ansible playbooks for common tasks, such as rolling 
restart. We also use Cassandra Reaper for scheduling and running 
repairs on the same cluster. We occasionally also use pssh (parallel 
SSH) for inspecting the logs or configurations on selected nodes. 
Running pssh on very larger number of servers is obviously not 
practical due the the available screen space constraint.


On 28/02/2022 21:59, Joe Obernberger wrote:
Hi all - curious what tools are folks using to manage large Cassandra 
clusters?  For example, to do tasks such as nodetool cleanup after a 
node or nodes are added to the cluster, or simply rolling start/stops 
after an update to the config or a new version?

We've used puppet before; is that what other folks are using?
Thanks for any suggestions.

-Joe





Cassandra Management tools?

2022-02-28 Thread Joe Obernberger
Hi all - curious what tools are folks using to manage large Cassandra 
clusters?  For example, to do tasks such as nodetool cleanup after a 
node or nodes are added to the cluster, or simply rolling start/stops 
after an update to the config or a new version?

We've used puppet before; is that what other folks are using?
Thanks for any suggestions.

-Joe



Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - the answer was spark.cassandra.input.split.sizeInMB. The 
default value is 512MBytes.  Setting this to 50 resulted in a lot more 
splits and the job ran in under 11 minutes; no timeout errors.  In this 
case the job was a simple count.  10 minutes 48 seconds for over 8.2 
billion rows.  Fast!


Good times ahead.

-Joe

On 2/8/2022 10:00 AM, Joe Obernberger wrote:


Update - I believe that for large tables, the 
spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or 
longer.  The job now runs much longer, but still doesn't complete.  
I'm now facing this all too familiar error:
com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: 
Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded)


In the past this has been due to clocks being out of sync (not the 
issue here), or a table that has been written to with LOCAL_ONE 
instead of LOCAL_QUORUM.  I don't believe either of those are the 
case.  To be sure, I ran a repair on the table overnight (about 17 
hours to complete).  For the next test, I set the 
spark.cassandra.connection.timeoutMS to 6 (default is 5000), and 
the spark.cassandra.query.retry.count to -1.


Suggestions?  Thoughts?

Thanks all.

-Joe

On 2/7/2022 10:35 AM, Joe Obernberger wrote:


Some more info.  Tried different GC strategies - no luck.
It only happens on large tables (more than 1 billion rows). Works 
fine on a 300million row table.  There is very high CPU usage during 
the run.


I've tried setting spark.dse.continuousPagingEnabled to false and 
I've tried setting spark.cassandra.input.readsPerSec to 10; no effect.


Stats:

nodetool cfstats doc.doc
Total number of tables: 82

Keyspace : doc
    Read Count: 9620329
    Read Latency: 0.5629605546754171 ms
    Write Count: 510561482
    Write Latency: 0.02805177028806885 ms
    Pending Flushes: 0
    Table: doc
    SSTable count: 77
    Old SSTable count: 0
    Space used (live): 82061188941
    Space used (total): 82061188941
    Space used by snapshots (total): 0
    Off heap memory used (total): 317037065
    SSTable Compression Ratio: 0.3816525125492022
    Number of partitions (estimate): 101021793
    Memtable cell count: 209646
    Memtable data size: 44087966
    Memtable off heap memory used: 0
    Memtable switch count: 10
    Local read count: 25665
    Local read latency: NaN ms
    Local write count: 2459322
    Local write latency: NaN ms
    Pending flushes: 0
    Percent repaired: 0.0
    Bytes repaired: 0.000KiB
    Bytes unrepaired: 184.869GiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 2063
    Bloom filter false ratio: 0.01020
    Bloom filter space used: 169249016
    Bloom filter off heap memory used: 169248400
    Index summary off heap memory used: 50863401
    Compression metadata off heap memory used: 96925264
    Compacted partition minimum bytes: 104
    Compacted partition maximum bytes: 943127
    Compacted partition mean bytes: 1721
    Average live cells per slice (last five minutes): NaN
    Maximum live cells per slice (last five minutes): 0
    Average tombstones per slice (last five minutes): NaN
    Maximum tombstones per slice (last five minutes): 0
    Dropped Mutations: 0


nodetool tablehistograms doc.doc
doc/doc histograms
Percentile  Read Latency Write Latency SSTables    Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 0.00  0.00 0.00  
1109    86
75% 0.00  0.00 0.00  
3311   215
95% 0.00  0.00 0.00  
3311   215
98% 0.00  0.00 0.00  
3311   215
99% 0.00  0.00 0.00  
3311   215
Min 0.00  0.00 0.00   
104 5
Max 0.00  0.00 0.00    
943127  2299


I'm stuck.

-Joe


On 2/3/2022 9:30 PM, manish khandelwal wrote:
It maybe the case you have lots of tombstones in this table which is 
making reads slow and timeouts during bulk reads.


On Fri, Feb 4, 2022, 03:23 Joe Obernberger 
 wrote:


So it turns out that number after PT is increments of 60
seconds.  I changed the timeout to 96, and now I get PT16M
(96/6).  Since I'm still getting

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - I believe that for large tables, the 
spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or 
longer.  The job now runs much longer, but still doesn't complete.  I'm 
now facing this all too familiar error:
com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: 
Cassandra timeout during read query at consistency LOCAL_ONE (1 
responses were required but only 0 replica responded)


In the past this has been due to clocks being out of sync (not the issue 
here), or a table that has been written to with LOCAL_ONE instead of 
LOCAL_QUORUM.  I don't believe either of those are the case.  To be 
sure, I ran a repair on the table overnight (about 17 hours to 
complete).  For the next test, I set the 
spark.cassandra.connection.timeoutMS to 6 (default is 5000), and the 
spark.cassandra.query.retry.count to -1.


Suggestions?  Thoughts?

Thanks all.

-Joe

On 2/7/2022 10:35 AM, Joe Obernberger wrote:


Some more info.  Tried different GC strategies - no luck.
It only happens on large tables (more than 1 billion rows). Works fine 
on a 300million row table.  There is very high CPU usage during the run.


I've tried setting spark.dse.continuousPagingEnabled to false and I've 
tried setting spark.cassandra.input.readsPerSec to 10; no effect.


Stats:

nodetool cfstats doc.doc
Total number of tables: 82

Keyspace : doc
    Read Count: 9620329
    Read Latency: 0.5629605546754171 ms
    Write Count: 510561482
    Write Latency: 0.02805177028806885 ms
    Pending Flushes: 0
    Table: doc
    SSTable count: 77
    Old SSTable count: 0
    Space used (live): 82061188941
    Space used (total): 82061188941
    Space used by snapshots (total): 0
    Off heap memory used (total): 317037065
    SSTable Compression Ratio: 0.3816525125492022
    Number of partitions (estimate): 101021793
    Memtable cell count: 209646
    Memtable data size: 44087966
    Memtable off heap memory used: 0
    Memtable switch count: 10
    Local read count: 25665
    Local read latency: NaN ms
    Local write count: 2459322
    Local write latency: NaN ms
    Pending flushes: 0
    Percent repaired: 0.0
    Bytes repaired: 0.000KiB
    Bytes unrepaired: 184.869GiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 2063
    Bloom filter false ratio: 0.01020
    Bloom filter space used: 169249016
    Bloom filter off heap memory used: 169248400
    Index summary off heap memory used: 50863401
    Compression metadata off heap memory used: 96925264
    Compacted partition minimum bytes: 104
    Compacted partition maximum bytes: 943127
    Compacted partition mean bytes: 1721
    Average live cells per slice (last five minutes): NaN
    Maximum live cells per slice (last five minutes): 0
    Average tombstones per slice (last five minutes): NaN
    Maximum tombstones per slice (last five minutes): 0
    Dropped Mutations: 0


nodetool tablehistograms doc.doc
doc/doc histograms
Percentile  Read Latency Write Latency SSTables    Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 0.00  0.00 0.00  
1109    86
75% 0.00  0.00 0.00  
3311   215
95% 0.00  0.00 0.00  
3311   215
98% 0.00  0.00 0.00  
3311   215
99% 0.00  0.00 0.00  
3311   215
Min 0.00  0.00 0.00   
104 5
Max 0.00  0.00 0.00    
943127  2299


I'm stuck.

-Joe


On 2/3/2022 9:30 PM, manish khandelwal wrote:
It maybe the case you have lots of tombstones in this table which is 
making reads slow and timeouts during bulk reads.


On Fri, Feb 4, 2022, 03:23 Joe Obernberger 
 wrote:


So it turns out that number after PT is increments of 60
seconds.  I changed the timeout to 96, and now I get PT16M
(96/6).  Since I'm still getting timeouts, something else
must be wrong.

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 306 in stage 0.0 failed 4
times, most recent failure: Lost task 306.3 in stage 0.0 (TID
1180) (172.16.100.39 executor 0):
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out a

Re: Query timed out after PT2M

2022-02-07 Thread Joe Obernberger

Some more info.  Tried different GC strategies - no luck.
It only happens on large tables (more than 1 billion rows).  Works fine 
on a 300million row table.  There is very high CPU usage during the run.


I've tried setting spark.dse.continuousPagingEnabled to false and I've 
tried setting spark.cassandra.input.readsPerSec to 10; no effect.


Stats:

nodetool cfstats doc.doc
Total number of tables: 82

Keyspace : doc
    Read Count: 9620329
    Read Latency: 0.5629605546754171 ms
    Write Count: 510561482
    Write Latency: 0.02805177028806885 ms
    Pending Flushes: 0
    Table: doc
    SSTable count: 77
    Old SSTable count: 0
    Space used (live): 82061188941
    Space used (total): 82061188941
    Space used by snapshots (total): 0
    Off heap memory used (total): 317037065
    SSTable Compression Ratio: 0.3816525125492022
    Number of partitions (estimate): 101021793
    Memtable cell count: 209646
    Memtable data size: 44087966
    Memtable off heap memory used: 0
    Memtable switch count: 10
    Local read count: 25665
    Local read latency: NaN ms
    Local write count: 2459322
    Local write latency: NaN ms
    Pending flushes: 0
    Percent repaired: 0.0
    Bytes repaired: 0.000KiB
    Bytes unrepaired: 184.869GiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 2063
    Bloom filter false ratio: 0.01020
    Bloom filter space used: 169249016
    Bloom filter off heap memory used: 169248400
    Index summary off heap memory used: 50863401
    Compression metadata off heap memory used: 96925264
    Compacted partition minimum bytes: 104
    Compacted partition maximum bytes: 943127
    Compacted partition mean bytes: 1721
    Average live cells per slice (last five minutes): NaN
    Maximum live cells per slice (last five minutes): 0
    Average tombstones per slice (last five minutes): NaN
    Maximum tombstones per slice (last five minutes): 0
    Dropped Mutations: 0


nodetool tablehistograms doc.doc
doc/doc histograms
Percentile  Read Latency Write Latency SSTables    Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 0.00  0.00 0.00  
1109    86
75% 0.00  0.00 0.00  
3311   215
95% 0.00  0.00 0.00  
3311   215
98% 0.00  0.00 0.00  
3311   215
99% 0.00  0.00 0.00  
3311   215
Min 0.00  0.00 0.00   
104 5
Max 0.00  0.00 0.00    
943127  2299


I'm stuck.

-Joe


On 2/3/2022 9:30 PM, manish khandelwal wrote:
It maybe the case you have lots of tombstones in this table which is 
making reads slow and timeouts during bulk reads.


On Fri, Feb 4, 2022, 03:23 Joe Obernberger 
 wrote:


So it turns out that number after PT is increments of 60 seconds. 
I changed the timeout to 96, and now I get PT16M
(96/6).  Since I'm still getting timeouts, something else
must be wrong.

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 306 in stage 0.0 failed 4
times, most recent failure: Lost task 306.3 in stage 0.0 (TID
1180) (172.16.100.39 executor 0):
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
    at

org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$ab

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger

I've tried several different GC settings - but still getting timeouts.
Using openJDK 11 with:
-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=24
-XX:ConcGCThreads=24

Machine has 40 cores.  Xmx is set to 32G.
13 node cluster.

Any ideas on what else to try?

-Joe

On 2/4/2022 10:39 AM, Joe Obernberger wrote:


Still no go.  Oddly, I can use trino and do a count OK, but with spark 
I get the timeouts.  I don't believe tombstones are an issue:


nodetool cfstats doc.doc
Total number of tables: 82

Keyspace : doc
    Read Count: 1514288521
    Read Latency: 0.5080819034089475 ms
    Write Count: 12716563031
    Write Latency: 0.1462260620347646 ms
    Pending Flushes: 0
    Table: doc
    SSTable count: 72
    Old SSTable count: 0
    Space used (live): 74097778114
    Space used (total): 74097778114
    Space used by snapshots (total): 0
    Off heap memory used (total): 287187173
    SSTable Compression Ratio: 0.38644718028460934
    Number of partitions (estimate): 94111032
    Memtable cell count: 175084
    Memtable data size: 36945327
    Memtable off heap memory used: 0
    Memtable switch count: 677
    Local read count: 16237350
    Local read latency: 0.639 ms
    Local write count: 314822497
    Local write latency: 0.061 ms
    Pending flushes: 0
    Percent repaired: 0.0
    Bytes repaired: 0.000KiB
    Bytes unrepaired: 164.168GiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 154552
    Bloom filter false ratio: 0.01059
    Bloom filter space used: 152765592
    Bloom filter off heap memory used: 152765016
    Index summary off heap memory used: 48349869
    Compression metadata off heap memory used: 86072288
    Compacted partition minimum bytes: 104
    Compacted partition maximum bytes: 943127
    Compacted partition mean bytes: 1609
    Average live cells per slice (last five minutes): 
1108.6270918991

    Maximum live cells per slice (last five minutes): 1109
    Average tombstones per slice (last five minutes): 1.0
    Maximum tombstones per slice (last five minutes): 1
    Dropped Mutations: 0

Other things to check?

-Joe

On 2/3/2022 9:30 PM, manish khandelwal wrote:
It maybe the case you have lots of tombstones in this table which is 
making reads slow and timeouts during bulk reads.


On Fri, Feb 4, 2022, 03:23 Joe Obernberger 
 wrote:


So it turns out that number after PT is increments of 60
seconds.  I changed the timeout to 96, and now I get PT16M
(96/6).  Since I'm still getting timeouts, something else
must be wrong.

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 306 in stage 0.0 failed 4
times, most recent failure: Lost task 306.3 in stage 0.0 (TID
1180) (172.16.100.39 executor 0):
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
    at

org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskS

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
Still no go.  Oddly, I can use trino and do a count OK, but with spark I 
get the timeouts.  I don't believe tombstones are an issue:


nodetool cfstats doc.doc
Total number of tables: 82

Keyspace : doc
    Read Count: 1514288521
    Read Latency: 0.5080819034089475 ms
    Write Count: 12716563031
    Write Latency: 0.1462260620347646 ms
    Pending Flushes: 0
    Table: doc
    SSTable count: 72
    Old SSTable count: 0
    Space used (live): 74097778114
    Space used (total): 74097778114
    Space used by snapshots (total): 0
    Off heap memory used (total): 287187173
    SSTable Compression Ratio: 0.38644718028460934
    Number of partitions (estimate): 94111032
    Memtable cell count: 175084
    Memtable data size: 36945327
    Memtable off heap memory used: 0
    Memtable switch count: 677
    Local read count: 16237350
    Local read latency: 0.639 ms
    Local write count: 314822497
    Local write latency: 0.061 ms
    Pending flushes: 0
    Percent repaired: 0.0
    Bytes repaired: 0.000KiB
    Bytes unrepaired: 164.168GiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 154552
    Bloom filter false ratio: 0.01059
    Bloom filter space used: 152765592
    Bloom filter off heap memory used: 152765016
    Index summary off heap memory used: 48349869
    Compression metadata off heap memory used: 86072288
    Compacted partition minimum bytes: 104
    Compacted partition maximum bytes: 943127
    Compacted partition mean bytes: 1609
    Average live cells per slice (last five minutes): 
1108.6270918991

    Maximum live cells per slice (last five minutes): 1109
    Average tombstones per slice (last five minutes): 1.0
    Maximum tombstones per slice (last five minutes): 1
    Dropped Mutations: 0

Other things to check?

-Joe

On 2/3/2022 9:30 PM, manish khandelwal wrote:
It maybe the case you have lots of tombstones in this table which is 
making reads slow and timeouts during bulk reads.


On Fri, Feb 4, 2022, 03:23 Joe Obernberger 
 wrote:


So it turns out that number after PT is increments of 60 seconds. 
I changed the timeout to 96, and now I get PT16M
(96/6).  Since I'm still getting timeouts, something else
must be wrong.

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 306 in stage 0.0 failed 4
times, most recent failure: Lost task 306.3 in stage 0.0 (TID
1180) (172.16.100.39 executor 0):
com.datastax.oss.driver.api.core.DriverTimeoutException: Query
timed out after PT16M
    at

com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at

com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at

com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
    at

org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
    at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
    at scala.Option.foreach(Option.scala:407)
    at

org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
    at

org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
So it turns out that number after PT is increments of 60 seconds.  I 
changed the timeout to 96, and now I get PT16M (96/6).  
Since I'm still getting timeouts, something else must be wrong.


Exception in thread "main" org.apache.spark.SparkException: Job aborted 
due to stage failure: Task 306 in stage 0.0 failed 4 times, most recent 
failure: Lost task 306.3 in stage 0.0 (TID 1180) (172.16.100.39 executor 
0): com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed 
out after PT16M
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at 
com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:
    at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
    at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403)
    at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
    at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
    at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
    at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)

    at scala.Option.foreach(Option.scala:407)
    at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)

    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: 
Query timed out after PT16M
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.lambda$scheduleTimeout$1(CqlRequestHandler.java:206)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
    at 
com.datastax.oss.driver.shaded.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
    at 
com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)


-Joe

On 2/3/2022 3:30 PM, Joe Obernberger wrote:


I did find this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
    .config("spark.cassandra.connection.host", "chaos")
    .config("spark.cassandra.connection.port", "9042")
.master("spark://aether.querymasters.com:8181")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're getting 
that duration from.


Right now, I'm just doing a count:
Dataset dataset = 
spark.read().format("org.apache.spark.sql.cassandra")

    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that could 
be set by your Spark job's initialization of the client. I don't see 
a string matching this in the Cassandra codebase itself, but I do 
see that this is parseable as

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger

I did find this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md

And "spark.cassandra.read.timeoutMS" is set to 12.

Running a test now, and I think that is it.  Thank you Scott.

-Joe

On 2/3/2022 3:19 PM, Joe Obernberger wrote:


Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
    .config("spark.cassandra.connection.host", "chaos")
    .config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're getting 
that duration from.


Right now, I'm just doing a count:
Dataset dataset = 
spark.read().format("org.apache.spark.sql.cassandra")

    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that could 
be set by your Spark job's initialization of the client. I don't see 
a string matching this in the Cassandra codebase itself, but I do see 
that this is parseable as a Duration.


```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the timeout 
from the server's perspective. You might consider checking lots from 
the replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating 
overload/inability to serve a response.


If you're running a Spark job, I'd recommend using the DataStax Spark 
Cassandra Connector which distributes your query to executors 
addressing slices of the token range which will land on replica sets, 
avoiding the scatter-gather behavior that can occur if using the Java 
driver alone.


Cheers,

– Scott


On Feb 3, 2022, at 11:42 AM, Joe Obernberger 
 wrote:



Hi all - using a Cassandra 4.0.1 and a spark job running against a 
large

table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra cluster that
is about 300 million rows and completes in about 2 minutes.  The 
cluster

is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger

Thank you Scott!
I am using the spark cassandra connector.  Code:

SparkSession spark = SparkSession
    .builder()
    .appName("SparkCassandraApp")
    .config("spark.cassandra.connection.host", "chaos")
    .config("spark.cassandra.connection.port", "9042")
    .master("spark://aether.querymasters.com:8181")
    .getOrCreate();

Would I set PT2M in there?  Like .config("pt2m","300") ?
I'm not familiar with jshell, so I'm not sure where you're getting that 
duration from.


Right now, I'm just doing a count:
Dataset dataset = spark.read().format("org.apache.spark.sql.cassandra")
    .options(new HashMap() {
    {
    put("keyspace", "doc");
    put("table", "doc");
    }
    }).load();

dataset.count();


Thank you!

-Joe

On 2/3/2022 3:01 PM, C. Scott Andreas wrote:
Hi Joe, it looks like "PT2M" may refer to a timeout value that could 
be set by your Spark job's initialization of the client. I don't see a 
string matching this in the Cassandra codebase itself, but I do see 
that this is parseable as a Duration.


```
jshell> java.time.Duration.parse("PT2M").getSeconds()
$7 ==> 120
```

The server-side log you see is likely an indicator of the timeout from 
the server's perspective. You might consider checking lots from the 
replicas for dropped reads, query aborts due to scanning more 
tombstones than the configured max, or other conditions indicating 
overload/inability to serve a response.


If you're running a Spark job, I'd recommend using the DataStax Spark 
Cassandra Connector which distributes your query to executors 
addressing slices of the token range which will land on replica sets, 
avoiding the scatter-gather behavior that can occur if using the Java 
driver alone.


Cheers,

– Scott


On Feb 3, 2022, at 11:42 AM, Joe Obernberger 
 wrote:



Hi all - using a Cassandra 4.0.1 and a spark job running against a large
table (~8 billion rows) and I'm getting this error on the client side:
Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647
ReadCallback.java:119 - Timed out; received 0 of 1 responses

The same code works on another table in the same Cassandra cluster that
is about 300 million rows and completes in about 2 minutes.  The cluster
is 13 nodes.

I can't find what PT2M means.  Perhaps the table needs a repair? Other
ideas?
Thank you!

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
Hi all - using a Cassandra 4.0.1 and a spark job running against a large 
table (~8 billion rows) and I'm getting this error on the client side:

Query timed out after PT2M

On the server side I see a lot of messages like:
DEBUG [Native-Transport-Requests-39] 2022-02-03 14:39:56,647 
ReadCallback.java:119 - Timed out; received 0 of 1 responses


The same code works on another table in the same Cassandra cluster that 
is about 300 million rows and completes in about 2 minutes.  The cluster 
is 13 nodes.


I can't find what PT2M means.  Perhaps the table needs a repair? Other 
ideas?

Thank you!

-Joe



Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger

Thank you Dmitry.
At this point the one node where I removed the first drive from the list 
and then rebuilt it, is now in some odd state.  Locally nodetool status 
shows it as up (UN), but all the other nodes in the cluster show it as 
down (DN).


Not sure what to do at this juncture.

-Joe

On 1/7/2022 4:38 PM, Dmitry Saprykin wrote:

There is a jira ticket describing your situation
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-14793

I may be wrong but is seems that system directories are pinned to 
first data directory in cassandra.yaml by default. When you removed 
first item from the list system data regenerated in the new first 
directory in the list. And then merged??? when original first dir returned


On Fri, Jan 7, 2022 at 4:23 PM Joe Obernberger 
 wrote:


Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh <http://cassandra.env.sh> file, and
started it up.  It is currently bootstrapping.

In cassandra.yaml, say you have the following:

data_file_directories:
    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

If I change the above to:
#    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

the problem happens.  If I change it to:

    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
#    - /data/8/cassandra

the node starts up OK.  I assume it will recover the missing data
during a repair?

-Joe

On 1/7/2022 4:13 PM, Mano ksio wrote:

Hi, you may have already tried, but this may help.

https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists


can you be little narrate 'If I remove a drive other than the
first one'? what does it means

On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger
 wrote:

Hi All - I have a 13 node cluster running Cassandra 4.0.1. 
If I stop a
node, edit the cassandra.yaml file, comment out the first
drive in the
list, and restart the node, it fails to start saying that a
node already
exists in the cluster with the IP address.

If I put the drive back into the list, the node still fails
to start
with the same error.  At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 <http://172.16.100.39:7000>
already exists, cancelling join. Use
cassandra.replace_address if you
want to replace this node.
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at

org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at

org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

---

If I remove a drive other than the first one, this problem
doesn't
occur.  Any other options?  It appears that if it the first
drive in the
list goes bad, or is just removed, that entire node must be
replaced.

-Joe



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>
Virus-free. www.avg.com

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient>


<#m_3361535422621871688_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger

Hi - in order to get the node back up and running I did the following:
Deleted all data on the node:
Added: -Dcassandra.replace_address=172.16.100.39
to the cassandra.env.sh file, and started it up.  It is currently 
bootstrapping.


In cassandra.yaml, say you have the following:

data_file_directories:
    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

If I change the above to:
#    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
    - /data/8/cassandra

the problem happens.  If I change it to:

    - /data/1/cassandra
    - /data/2/cassandra
    - /data/3/cassandra
    - /data/4/cassandra
    - /data/5/cassandra
    - /data/6/cassandra
    - /data/7/cassandra
#    - /data/8/cassandra

the node starts up OK.  I assume it will recover the missing data during 
a repair?


-Joe

On 1/7/2022 4:13 PM, Mano ksio wrote:
Hi, you may have already tried, but this may help. 
https://stackoverflow.com/questions/29323709/unable-to-start-cassandra-node-already-exists 



can you be little narrate 'If I remove a drive other than the first 
one'? what does it means


On Fri, Jan 7, 2022 at 2:52 PM Joe Obernberger 
 wrote:


Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I
stop a
node, edit the cassandra.yaml file, comment out the first drive in
the
list, and restart the node, it fails to start saying that a node
already
exists in the cluster with the IP address.

If I put the drive back into the list, the node still fails to start
with the same error.  At this point the node is useless and I
think the
only option is to remove all the data, and re-boostrap it?
-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address
/172.16.100.39:7000 <http://172.16.100.39:7000>
already exists, cancelling join. Use cassandra.replace_address if you
want to replace this node.
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

---

If I remove a drive other than the first one, this problem doesn't
occur.  Any other options?  It appears that if it the first drive
in the
list goes bad, or is just removed, that entire node must be replaced.

-Joe


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

removing a drive - 4.0.1

2022-01-07 Thread Joe Obernberger
Hi All - I have a 13 node cluster running Cassandra 4.0.1.  If I stop a 
node, edit the cassandra.yaml file, comment out the first drive in the 
list, and restart the node, it fails to start saying that a node already 
exists in the cluster with the IP address.


If I put the drive back into the list, the node still fails to start 
with the same error.  At this point the node is useless and I think the 
only option is to remove all the data, and re-boostrap it?

-

ERROR [main] 2022-01-07 15:50:09,155 CassandraDaemon.java:909 - 
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.39:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)


---

If I remove a drive other than the first one, this problem doesn't 
occur.  Any other options?  It appears that if it the first drive in the 
list goes bad, or is just removed, that entire node must be replaced.


-Joe



Re: Node failed after drive failed

2021-12-13 Thread Joe Obernberger
Thank you Bowen.  I had the policy set to "best_effort", but as Jeff 
pointed out since it was the first disk in the list that failed maybe 
that is a special case?


I don't have a spare drive at the moment, so I'll just delete all the 
cassandra data on that node and have it rejoin as a new node.


-Joe

On 12/11/2021 3:44 PM, Bowen Song wrote:


Hi Joe,

In case of a single disk failure, you should not remove the data 
directory from the cassandra.yaml file. Instead, you should replace 
the failed disk with a new empty disk. See 
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRecoverUsingJBOD.html 
for the steps.


Since your node failed to start, I guess it's not too late to restore 
the settings in the cassandra.yaml file and then follow the above 
steps. However, replacing the entire node is always an option if 
everything else has failed, as long as you have RF>1 and other nodes 
in the cluster are all healthy. If you need to do this, follow the 
steps here: 
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html


As of your last question,

> /When a drive fails with cassandra, is it common for the node to 
come down? /


this actually depends on the disk_failure_policy in your 
cassandra.yaml file, read the comments in it will help you understand 
the available choices.


Cheers,
Bowen

On 06/12/2021 14:11, Joe Obernberger wrote:
Hi All - one node in an 11 node cluster experienced a drive failure 
on the first drive in the list.  I removed that drive from the list 
so that it now reads:


data_file_directories:
    - /data/2/cassandra/data
    - /data/3/cassandra/data
    - /data/4/cassandra/data
    - /data/5/cassandra/data
    - /data/6/cassandra/data
    - /data/8/cassandra/data
    - /data/9/cassandra/data

But when I try to start the server, I get:

Exception (java.lang.RuntimeException) encountered during startup: A 
node with address /172.16.100.251:7000 already exists, cancelling 
join. Use cassandra.replace_address if you want to replace this node.
java.lang.RuntimeException: A node with address /172.16.100.251:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 - 
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.251:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
INFO  [StorageServiceShutdownHook] 2021-12-05 15:49:48,468 
HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-12-05 15:49:48,470 
Gossiper.java:1993 - No local state, state is in silent shutdown, or 
node hasn't joined, not announcing shutdown


Do I need to remove and re-add the node?  When a drive fails with 
cassandra, is it common for the node to come down?


Thank you!

-Joe Obernberger



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Added node - now queries time out

2021-12-09 Thread Joe Obernberger

This worked - decommissioned the node, and re-adding it worked.

If a drive fails on a Cassandra node, what is the process to bring that 
node back up?


-joe

On 12/3/2021 4:31 PM, Bowen Song wrote:
The load on the new server looks clearly wrong. Are you sure this node 
has fully bootstraped / rebuilt? If not, the large amount of streaming 
activity triggered by read repair may be enough to cause timeouts. 
Please check the new server's log and make sure it did not fail any 
streaming session when it first joined the cluster. If in doubt, 
remove the node and re-add it, and keep an eye on the log.


On 03/12/2021 20:51, Joe Obernberger wrote:
Hi all - just added a node to an 11 node cluster (4.0.1) and it 
synced up OK, but now all queries are timing out.

This time I made sure the clocks are synced!  :)

Kinda desperate to get this to work again.  What can I check do? Just 
added the .34 node.  One item of concern is the amount of load/data 
on it compared to the others.
I'm running a repair on the new node, but things like select * from 
table, on a table with maybe 100 rows times out.

Help!

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host 
ID   Rack
UN  172.16.100.45   161.81 GiB  250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  128.6 GiB   200 ? 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  128.44 GiB  200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  128.43 GiB  200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   128.79 GiB  200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   127.47 GiB  200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  2.19 GiB    4   ? 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  127.74 GiB  200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   75.89 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  128.3 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  172.16.100.34   29.67 GiB   200 ? 
84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1


-joe





Node failed after drive failed

2021-12-06 Thread Joe Obernberger
Hi All - one node in an 11 node cluster experienced a drive failure on 
the first drive in the list.  I removed that drive from the list so that 
it now reads:


data_file_directories:
    - /data/2/cassandra/data
    - /data/3/cassandra/data
    - /data/4/cassandra/data
    - /data/5/cassandra/data
    - /data/6/cassandra/data
    - /data/8/cassandra/data
    - /data/9/cassandra/data

But when I try to start the server, I get:

Exception (java.lang.RuntimeException) encountered during startup: A 
node with address /172.16.100.251:7000 already exists, cancelling join. 
Use cassandra.replace_address if you want to replace this node.
java.lang.RuntimeException: A node with address /172.16.100.251:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 - 
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.251:7000 
already exists, cancelling join. Use cassandra.replace_address if you 
want to replace this node.
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
INFO  [StorageServiceShutdownHook] 2021-12-05 15:49:48,468 
HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-12-05 15:49:48,470 
Gossiper.java:1993 - No local state, state is in silent shutdown, or 
node hasn't joined, not announcing shutdown


Do I need to remove and re-add the node?  When a drive fails with 
cassandra, is it common for the node to come down?


Thank you!

-Joe Obernberger



Re: Added node - now queries time out

2021-12-03 Thread Joe Obernberger

Thank you!

Interestingly as the node was being decommissioned the load/storage 
increased.  Once it was removed, I bounced the entire cluster and now 
it's working.


-Joe

On 12/3/2021 4:31 PM, Bowen Song wrote:
The load on the new server looks clearly wrong. Are you sure this node 
has fully bootstraped / rebuilt? If not, the large amount of streaming 
activity triggered by read repair may be enough to cause timeouts. 
Please check the new server's log and make sure it did not fail any 
streaming session when it first joined the cluster. If in doubt, 
remove the node and re-add it, and keep an eye on the log.


On 03/12/2021 20:51, Joe Obernberger wrote:
Hi all - just added a node to an 11 node cluster (4.0.1) and it 
synced up OK, but now all queries are timing out.

This time I made sure the clocks are synced!  :)

Kinda desperate to get this to work again.  What can I check do? Just 
added the .34 node.  One item of concern is the amount of load/data 
on it compared to the others.
I'm running a repair on the new node, but things like select * from 
table, on a table with maybe 100 rows times out.

Help!

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host 
ID   Rack
UN  172.16.100.45   161.81 GiB  250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  128.6 GiB   200 ? 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  128.44 GiB  200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  128.43 GiB  200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   128.79 GiB  200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   127.47 GiB  200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  2.19 GiB    4   ? 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  127.74 GiB  200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   75.89 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  128.3 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  172.16.100.34   29.67 GiB   200 ? 
84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1


-joe





Added node - now queries time out

2021-12-03 Thread Joe Obernberger
Hi all - just added a node to an 11 node cluster (4.0.1) and it synced 
up OK, but now all queries are timing out.

This time I made sure the clocks are synced!  :)

Kinda desperate to get this to work again.  What can I check do? Just 
added the .34 node.  One item of concern is the amount of load/data on 
it compared to the others.
I'm running a repair on the new node, but things like select * from 
table, on a table with maybe 100 rows times out.

Help!

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns  Host 
ID   Rack
UN  172.16.100.45   161.81 GiB  250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  172.16.100.251  128.6 GiB   200 ? 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  128.44 GiB  200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  128.43 GiB  200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   128.79 GiB  200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   127.47 GiB  200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  2.19 GiB    4   ? 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  127.74 GiB  200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   75.89 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  128.3 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  172.16.100.34   29.67 GiB   200 ? 
84219e6d-74ac-4d23-89d0-0bd734d0c09e  rack1


-joe



Re: High read Latency

2021-11-29 Thread Joe Obernberger

To add onto this message:

Queries are all on the partition key  (select 
origvalue,ingestdate,mediatype from doc.origdoc where uuid=?). Queries 
were very fast when the table was <10 million rows.


Table description:

describe doc.origdoc;

CREATE TABLE doc.origdoc (
    uuid text,
    ingestdate timestamp,
    markings text,
    mediatype text,
    origvalue text,
    source text,
    PRIMARY KEY (uuid, ingestdate)
) WITH CLUSTERING ORDER BY (ingestdate ASC)
    AND additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

-Joe

On 11/29/2021 11:22 AM, Joe Obernberger wrote:
I have an 11 node cluster and am experiencing high read latency on one 
table.  This table has ~112 million rows:


 nodetool tablehistograms doc.origdoc
doc/origdoc histograms
Percentile  Read Latency Write Latency SSTables Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 36157.19  0.00 1.00 310 4
75% 74975.55  0.00 1.00 372 4
95%    155469.30  0.00 1.00 642 4
98%    223875.79  0.00 1.00 924 4
99%    268650.95  0.00 1.00 924 4
Min   152.32  0.00 1.00 180 4
Max    464228.84  0.00 1.00 
9887    17


What should I look for to debug?

Thank you!

-Joe



High read Latency

2021-11-29 Thread Joe Obernberger
I have an 11 node cluster and am experiencing high read latency on one 
table.  This table has ~112 million rows:


 nodetool tablehistograms doc.origdoc
doc/origdoc histograms
Percentile  Read Latency Write Latency SSTables    Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 36157.19  0.00 1.00   
310 4
75% 74975.55  0.00 1.00   
372 4
95%    155469.30  0.00 1.00   
642 4
98%    223875.79  0.00 1.00   
924 4
99%    268650.95  0.00 1.00   
924 4
Min   152.32  0.00 1.00   
180 4
Max    464228.84  0.00 1.00  
9887    17


What should I look for to debug?

Thank you!

-Joe



Re: 4.0.1 - adding a node

2021-11-01 Thread Joe Obernberger
Hi Erick - yes I do.  There is a good chance that I'll remove hercules 
from the cluster (4 tokens), it was the first system that I put 
cassandra on.  Chaos has less drives than the other systems.


-joe

On 10/29/2021 7:57 PM, Erick Ramirez wrote:
Out of curiosity, what's up with hercules and chaos? Do you have 
different hardware deployed in your cluster? Cheers!



 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: 4.0.1 - adding a node

2021-10-29 Thread Joe Obernberger

Thank you Jeff - after cleanup:

UN  nyx.querymasters.com    535.15 GiB  250 38.0% 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  468.97 GiB  200 30.4% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  calypso.querymasters.com    470.21 GiB  200 30.4% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    593.82 GiB  200 30.4% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  charon.querymasters.com 475.65 GiB  200 30.4% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   476.46 GiB  200 30.4% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  hercules.querymasters.com   12.31 GiB   4 0.6%  
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  ursula.querymasters.com 481.43 GiB  200 30.3% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   436.18 GiB  200 30.5% 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  289.83 GiB  120 18.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 447.83 GiB  200 30.4% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


-Joe

On 10/28/2021 4:05 PM, Jeff Jirsa wrote:
I think you started at 4930 and ended at 5461, difference of 530 
(which is the new host)


If you run `nodetool cleanup` on every other node in the cluster, you 
likely drop back down close to 4931 again.




On Thu, Oct 28, 2021 at 12:04 PM Joe Obernberger 
 wrote:


I recently added a node to a cluster.  Immediately after adding the
node, the cluster status (nyx is the new node):

UJ nyx.querymasters.com <http://nyx.querymasters.com> 181.25 KiB
250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98 rack1
UN enceladus.querymasters.com <http://enceladus.querymasters.com>
569.53 GiB  200 35.1%
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN calypso.querymasters.com <http://calypso.querymasters.com>
578.79 GiB  200 34.8%
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN fortuna.querymasters.com <http://fortuna.querymasters.com>
593.79 GiB  200 34.6%
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN charon.querymasters.com <http://charon.querymasters.com> 603.3
GiB   200 35.0%
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN eros.querymasters.com <http://eros.querymasters.com> 589.04
GiB  200 34.2%
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN hercules.querymasters.com <http://hercules.querymasters.com>
12.31 GiB   4 0.7%
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN ursula.querymasters.com <http://ursula.querymasters.com> 611.65
GiB  200 35.0%
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN gaia.querymasters.com <http://gaia.querymasters.com> 480.62
GiB  200 34.7%
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN chaos.querymasters.com <http://chaos.querymasters.com> 358.07
GiB  120 20.5%
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN pallas.querymasters.com <http://pallas.querymasters.com> 537.88
GiB  200 35.3%
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1

If I add up the Load column, I get 4,632.67GiB.  After overnight:

[joeo@calypso ~]$ nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)
Host ID   Rack
UN nyx.querymasters.com <http://nyx.querymasters.com> 535.16 GiB 
250 38.0%
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN enceladus.querymasters.com <http://enceladus.querymasters.com>
568.87 GiB  200 30.4%
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN calypso.querymasters.com <http://calypso.querymasters.com>
578.81 GiB  200 30.4%
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN fortuna.querymasters.com <http://fortuna.querymasters.com>
593.82 GiB  200 30.4%
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN charon.querymasters.com <http://charon.querymasters.com> 602.38
GiB  200 30.4%
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN eros.querymasters.com <http://eros.querymasters.com> 588.3
GiB   200 30.4%
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN hercules.querymasters.com <http://hercules.querymasters.com>
12.31 GiB   4 0.6%
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN ursula.querymasters.com <http://ursula.querymasters.com> 610.54
GiB  200 30.3%
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN gaia.querymasters.com <http://gaia.querymasters.com> 480.53
GiB  200 30.5%
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN chaos.querymasters.com <http://chaos.querymasters.com> 358.44
GiB  12

4.0.1 - adding a node

2021-10-28 Thread Joe Obernberger
I recently added a node to a cluster.  Immediately after adding the 
node, the cluster status (nyx is the new node):


UJ  nyx.querymasters.com    181.25 KiB 250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98 rack1
UN  enceladus.querymasters.com  569.53 GiB  200 35.1% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  calypso.querymasters.com    578.79 GiB  200 34.8% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    593.79 GiB  200 34.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  charon.querymasters.com 603.3 GiB   200 35.0% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   589.04 GiB  200 34.2% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  hercules.querymasters.com   12.31 GiB   4 0.7%  
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  ursula.querymasters.com 611.65 GiB  200 35.0% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   480.62 GiB  200 34.7% 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  358.07 GiB  120 20.5% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 537.88 GiB  200 35.3% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


If I add up the Load column, I get 4,632.67GiB.  After overnight:

[joeo@calypso ~]$ nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  
Host ID   Rack
UN  nyx.querymasters.com    535.16 GiB  250 38.0% 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  568.87 GiB  200 30.4% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  calypso.querymasters.com    578.81 GiB  200 30.4% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    593.82 GiB  200 30.4% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  charon.querymasters.com 602.38 GiB  200 30.4% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   588.3 GiB   200 30.4% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  hercules.querymasters.com   12.31 GiB   4 0.6%  
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  ursula.querymasters.com 610.54 GiB  200 30.3% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   480.53 GiB  200 30.5% 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  358.44 GiB  120 18.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 537.94 GiB  200 30.4% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


If I add up the Load, I get 5,466.79GiB.  I have added no new data to 
the cluster, yet the Load has increased by 834.  Is this expected behavior?

Thank you!

-Joe



Re: Tombstones? 4.0.1

2021-10-25 Thread Joe Obernberger
Hi Jeff - yes, I'm doing a select without where - specifically: select 
uuid from table limit 1000;

Not inserting nulls, and nothing is TTL'd.
At this point with zero rows, the above select fails.

Sounds like my application needs a redesign as doing 1 billion inserts, 
and 100 million deletes results in an unusable table. I'm using 
Cassandra to de-duplicate data and that's not a good use case for it.


-Joe

On 10/25/2021 6:51 PM, Jeff Jirsa wrote:
The tombstone threshold is "how many tombstones are encountered within 
a single read command", and the default is something like 100,000 ( 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1293-L1294 
)


Deletes are not forbidden, but you have to read in such a way that you 
touch less than 100,000 deletes per read.


Are you doing full table scans or SELECT without WHERE?
Are you inserting nulls in some columns?
Are you TTL'ing everything ?



On Mon, Oct 25, 2021 at 3:28 PM Joe Obernberger 
 wrote:


Update - after 10 days, I'm able to use the table again; prior to
that all selects timed out.
Are deletes basically forbidden with Cassandra?  If you have a
table where you want to do lots of inserts and deletes, is there
an option that works in Cassandra?  Even thought the table now has
zero rows, after deleting them, I can no longer do a select from
the table as it times out.
Thank you!

-Joe

On 10/14/2021 3:38 PM, Joe Obernberger wrote:


I'm not sure if tombstones is the issue; is it?  Grace is set to
10 days, that time has not passed yet.

-Joe

On 10/14/2021 1:37 PM, James Brown wrote:

What is gc_grace_seconds set to on the table? Once that passes,
you can do `nodetool scrub` to more emphatically remove
tombstones...

On Thu, Oct 14, 2021 at 8:49 AM Joe Obernberger
 wrote:

Hi all - I have a table where I've needed to delete a number
of rows.
I've run repair, but I still can't select from the table.

select * from doc.indexorganize limit 10;
OperationTimedOut: errors={'172.16.100.37:9042
<http://172.16.100.37:9042>': 'Client request
timeout. See Session.execute[_async](timeout)'},
last_host=172.16.100.37:9042 <http://172.16.100.37:9042>

Info on the table:

nodetool tablestats doc.indexorganize
Total number of tables: 97

Keyspace : doc
 Read Count: 170275408
 Read Latency: 1.6486837044783356 ms
 Write Count: 6821769404
 Write Latency: 0.08147347268570909 ms
 Pending Flushes: 0
 Table: indexorganize
 SSTable count: 21
 Old SSTable count: 0
 Space used (live): 1536557040
 Space used (total): 1536557040
 Space used by snapshots (total): 1728378992
 Off heap memory used (total): 46251932
 SSTable Compression Ratio: 0.5218383898575761
 Number of partitions (estimate): 17365415
 Memtable cell count: 0
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 Local read count: 17346304
 Local read latency: NaN ms
 Local write count: 31340451
 Local write latency: NaN ms
 Pending flushes: 0
 Percent repaired: 100.0
 Bytes repaired: 1.084GiB
 Bytes unrepaired: 0.000KiB
 Bytes pending repair: 0.000KiB
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 38030728
 Bloom filter off heap memory used: 38030560
 Index summary off heap memory used: 7653060
 Compression metadata off heap memory used:
568312
 Compacted partition minimum bytes: 51
 Compacted partition maximum bytes: 86
 Compacted partition mean bytes: 67
 Average live cells per slice (last five
minutes):
73.53164556962025
 Maximum live cells per slice (last five
minutes): 5722
 Average tombstones per slice (last five
minutes): 1.0
 Maximum tombstones per slice (last five
minutes): 1
 Dropped Mutations: 0

nodetool tablehistograms doc.indexorganize
doc/indexorganize histograms
Percentile  Read Lat

Re: Tombstones? 4.0.1

2021-10-25 Thread Joe Obernberger
Update - after 10 days, I'm able to use the table again; prior to that 
all selects timed out.
Are deletes basically forbidden with Cassandra?  If you have a table 
where you want to do lots of inserts and deletes, is there an option 
that works in Cassandra?  Even thought the table now has zero rows, 
after deleting them, I can no longer do a select from the table as it 
times out.

Thank you!

-Joe

On 10/14/2021 3:38 PM, Joe Obernberger wrote:


I'm not sure if tombstones is the issue; is it?  Grace is set to 10 
days, that time has not passed yet.


-Joe

On 10/14/2021 1:37 PM, James Brown wrote:
What is gc_grace_seconds set to on the table? Once that passes, you 
can do `nodetool scrub` to more emphatically remove tombstones...


On Thu, Oct 14, 2021 at 8:49 AM Joe Obernberger 
 wrote:


Hi all - I have a table where I've needed to delete a number of
rows.
I've run repair, but I still can't select from the table.

select * from doc.indexorganize limit 10;
OperationTimedOut: errors={'172.16.100.37:9042
<http://172.16.100.37:9042>': 'Client request
timeout. See Session.execute[_async](timeout)'},
last_host=172.16.100.37:9042 <http://172.16.100.37:9042>

Info on the table:

nodetool tablestats doc.indexorganize
Total number of tables: 97

Keyspace : doc
 Read Count: 170275408
 Read Latency: 1.6486837044783356 ms
 Write Count: 6821769404
 Write Latency: 0.08147347268570909 ms
 Pending Flushes: 0
 Table: indexorganize
 SSTable count: 21
 Old SSTable count: 0
 Space used (live): 1536557040
 Space used (total): 1536557040
 Space used by snapshots (total): 1728378992
 Off heap memory used (total): 46251932
 SSTable Compression Ratio: 0.5218383898575761
 Number of partitions (estimate): 17365415
 Memtable cell count: 0
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 Local read count: 17346304
 Local read latency: NaN ms
 Local write count: 31340451
 Local write latency: NaN ms
 Pending flushes: 0
 Percent repaired: 100.0
 Bytes repaired: 1.084GiB
 Bytes unrepaired: 0.000KiB
 Bytes pending repair: 0.000KiB
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 38030728
 Bloom filter off heap memory used: 38030560
 Index summary off heap memory used: 7653060
 Compression metadata off heap memory used: 568312
 Compacted partition minimum bytes: 51
 Compacted partition maximum bytes: 86
 Compacted partition mean bytes: 67
 Average live cells per slice (last five minutes):
73.53164556962025
 Maximum live cells per slice (last five
minutes): 5722
 Average tombstones per slice (last five
minutes): 1.0
 Maximum tombstones per slice (last five minutes): 1
 Dropped Mutations: 0

nodetool tablehistograms doc.indexorganize
doc/indexorganize histograms
Percentile  Read Latency Write Latency SSTables Partition
Size    Cell Count
 (micros)
(micros) (bytes)
50% 0.00  0.00 0.00
60 1
75% 0.00  0.00 0.00
86 2
95% 0.00  0.00 0.00
86 2
98% 0.00  0.00 0.00
86 2
99% 0.00  0.00 0.00
86 2
Min 0.00  0.00 0.00
51 0
Max 0.00  0.00 0.00
86 2

Any ideas on what I can do?  Thank you!

-Joe



--
James Brown
Engineer

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Tombstones? 4.0.1

2021-10-14 Thread Joe Obernberger
I'm not sure if tombstones is the issue; is it?  Grace is set to 10 
days, that time has not passed yet.


-Joe

On 10/14/2021 1:37 PM, James Brown wrote:
What is gc_grace_seconds set to on the table? Once that passes, you 
can do `nodetool scrub` to more emphatically remove tombstones...


On Thu, Oct 14, 2021 at 8:49 AM Joe Obernberger 
 wrote:


Hi all - I have a table where I've needed to delete a number of rows.
I've run repair, but I still can't select from the table.

select * from doc.indexorganize limit 10;
OperationTimedOut: errors={'172.16.100.37:9042
<http://172.16.100.37:9042>': 'Client request
timeout. See Session.execute[_async](timeout)'},
last_host=172.16.100.37:9042 <http://172.16.100.37:9042>

Info on the table:

nodetool tablestats doc.indexorganize
Total number of tables: 97

Keyspace : doc
 Read Count: 170275408
 Read Latency: 1.6486837044783356 ms
 Write Count: 6821769404
 Write Latency: 0.08147347268570909 ms
 Pending Flushes: 0
 Table: indexorganize
 SSTable count: 21
 Old SSTable count: 0
 Space used (live): 1536557040
 Space used (total): 1536557040
 Space used by snapshots (total): 1728378992
 Off heap memory used (total): 46251932
 SSTable Compression Ratio: 0.5218383898575761
 Number of partitions (estimate): 17365415
 Memtable cell count: 0
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 Local read count: 17346304
 Local read latency: NaN ms
 Local write count: 31340451
 Local write latency: NaN ms
 Pending flushes: 0
 Percent repaired: 100.0
 Bytes repaired: 1.084GiB
 Bytes unrepaired: 0.000KiB
 Bytes pending repair: 0.000KiB
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 38030728
 Bloom filter off heap memory used: 38030560
 Index summary off heap memory used: 7653060
 Compression metadata off heap memory used: 568312
 Compacted partition minimum bytes: 51
 Compacted partition maximum bytes: 86
 Compacted partition mean bytes: 67
 Average live cells per slice (last five minutes):
73.53164556962025
 Maximum live cells per slice (last five minutes):
5722
 Average tombstones per slice (last five minutes): 1.0
 Maximum tombstones per slice (last five minutes): 1
 Dropped Mutations: 0

nodetool tablehistograms doc.indexorganize
doc/indexorganize histograms
Percentile  Read Latency Write Latency SSTables Partition
Size    Cell Count
 (micros) (micros)
(bytes)
50% 0.00  0.00 0.00
60 1
75% 0.00  0.00 0.00
86 2
95% 0.00  0.00 0.00
86 2
98% 0.00  0.00 0.00
86 2
99% 0.00  0.00 0.00
86 2
Min 0.00  0.00 0.00
51 0
Max 0.00  0.00 0.00
86 2

Any ideas on what I can do?  Thank you!

-Joe



--
James Brown
Engineer

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Tombstones? 4.0.1

2021-10-14 Thread Joe Obernberger
Hi all - I have a table where I've needed to delete a number of rows.  
I've run repair, but I still can't select from the table.


select * from doc.indexorganize limit 10;
OperationTimedOut: errors={'172.16.100.37:9042': 'Client request 
timeout. See Session.execute[_async](timeout)'}, 
last_host=172.16.100.37:9042


Info on the table:

nodetool tablestats doc.indexorganize
Total number of tables: 97

Keyspace : doc
    Read Count: 170275408
    Read Latency: 1.6486837044783356 ms
    Write Count: 6821769404
    Write Latency: 0.08147347268570909 ms
    Pending Flushes: 0
    Table: indexorganize
    SSTable count: 21
    Old SSTable count: 0
    Space used (live): 1536557040
    Space used (total): 1536557040
    Space used by snapshots (total): 1728378992
    Off heap memory used (total): 46251932
    SSTable Compression Ratio: 0.5218383898575761
    Number of partitions (estimate): 17365415
    Memtable cell count: 0
    Memtable data size: 0
    Memtable off heap memory used: 0
    Memtable switch count: 12
    Local read count: 17346304
    Local read latency: NaN ms
    Local write count: 31340451
    Local write latency: NaN ms
    Pending flushes: 0
    Percent repaired: 100.0
    Bytes repaired: 1.084GiB
    Bytes unrepaired: 0.000KiB
    Bytes pending repair: 0.000KiB
    Bloom filter false positives: 0
    Bloom filter false ratio: 0.0
    Bloom filter space used: 38030728
    Bloom filter off heap memory used: 38030560
    Index summary off heap memory used: 7653060
    Compression metadata off heap memory used: 568312
    Compacted partition minimum bytes: 51
    Compacted partition maximum bytes: 86
    Compacted partition mean bytes: 67
    Average live cells per slice (last five minutes): 
73.53164556962025

    Maximum live cells per slice (last five minutes): 5722
    Average tombstones per slice (last five minutes): 1.0
    Maximum tombstones per slice (last five minutes): 1
    Dropped Mutations: 0

nodetool tablehistograms doc.indexorganize
doc/indexorganize histograms
Percentile  Read Latency Write Latency SSTables    Partition 
Size    Cell Count

    (micros) (micros) (bytes)
50% 0.00  0.00 0.00    
60 1
75% 0.00  0.00 0.00    
86 2
95% 0.00  0.00 0.00    
86 2
98% 0.00  0.00 0.00    
86 2
99% 0.00  0.00 0.00    
86 2
Min 0.00  0.00 0.00    
51 0
Max 0.00  0.00 0.00    
86 2


Any ideas on what I can do?  Thank you!

-Joe



Re: Latest Supported RedHat Linux version for Cassandra 3.11

2021-09-27 Thread Joe Obernberger
Just as a data point - I'm running 4.0.1 on Rocky Linux 8x and CentOS 
Stream 8.x.


-Joe

On 9/27/2021 12:09 PM, Saha, Sushanta K wrote:
I am currently running Open Source Apache Cassandra 3.11.1 on RedHat 
7.7. But, need to upgrade the OS to RedHat to 7.9 or 8.x.


The site 
cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html 
 
has listed "CentOS & RedHat Enterprise Linux (RHEL) including 6.6 to 
7.7". FYI.


Question : Can I run Cassandra 3.11.1 on RedHat 7.9 or 8.x?

Thanks
 Sushanta


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: COUNTER timeout

2021-09-15 Thread Joe Obernberger

Thank you!
Clocks were out of sync; chronyd wasn't chrony'ding.
Going so much faster now!  Cheers.

-Joe

On 9/15/2021 4:07 PM, Bowen Song wrote:


Well, the log says cross node timeout, latency a bit over 44 seconds. 
Here's a few most likely causes:


1. The clocks are not in sync - please check the time on each server, 
and ensure NTP client is running on all Cassandra servers


2. Long stop the world GC pauses - please check the GC logs and make 
sure this isn't the case


3. Overload - please monitor the CPU usage and disk IO when timeout 
happens and make sure they are not the bottleneck



On 15/09/2021 20:34, Joe Obernberger wrote:


Thank you Erick - looking through all the logs on the nodes I found this:

INFO  [CompactionExecutor:17551] 2021-09-15 15:13:20,524 
CompactionTask.java:245 - Compacted 
(fb0cdca0-1658-11ec-9098-dd70c3a3487a) 4 sstables to 
[/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96619-big,] 
to level=0.  9.762MiB to 9.672MiB (~99% of original) in 3,873ms.  
Read Throughput = 2.520MiB/s, Write Throughput = 2.497MiB/s, Row 
Throughput = ~125,729/s.  255,171 total partitions merged to 
251,458.  Partition merge counts were {1:247758, 2:3687, 3:13, }
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,524 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96618-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,525 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96575-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,526 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96607-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,532 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96554-big
DEBUG [epollEventLoopGroup-5-85] 2021-09-15 15:13:20,642 
InitialConnectionHandler.java:121 - Response to STARTUP sent, 
configuring pipeline for 5/v5
DEBUG [epollEventLoopGroup-5-85] 2021-09-15 15:13:20,643 
InitialConnectionHandler.java:153 - Configured pipeline: 
DefaultChannelPipeline{(frameDecoder = 
org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = 
org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = 
org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler 
= 
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
INFO  [ScheduledTasks:1] 2021-09-15 15:13:21,976 
MessagingMetrics.java:206 - COUNTER_MUTATION_RSP messages were 
dropped in last 5000 ms: 0 internal and 1 cross node. Mean internal 
dropped latency: 0 ms and Mean cross-node dropped latency: 44285 ms


So - yes, nodes are dropping mutations.  I did find a node where one 
of the drives was pegged.  Fixed that - but it's still happening.  
This happened after adding a relatively large node (.44) to the cluster:


nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective) Host 
ID   Rack
UN  172.16.100.251  526.35 GiB  200 35.1% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  537.14 GiB  200 34.8% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  548.82 GiB  200 34.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   561.85 GiB  200 35.0% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   547.86 GiB  200 34.2% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.52 GiB   4   0.7% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  560.63 GiB  200 35.0% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.44   432.76 GiB  200 34.7% 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  172.16.100.37   331.31 GiB  120 20.5% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  501.62 GiB  200 35.3% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


At this point I'm not sure what's going on.  Some repairs have failed 
over the past few days.


-Joe

On 9/14/2021 7:23 PM, Erick Ramirez wrote:
The obvious conclusion is to say that the nodes can't keep up so it 
would be interesting to know how often you're issuing the counter 
updates. Also, how are the commit log disks performing on the nodes? 
If you have monitoring in place, check the IO stats/metrics. And 
finally, review the logs on the nodes to see if they are indeed 
dropping mutations. Cheers!


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: COUNTER timeout

2021-09-15 Thread Joe Obernberger

Thank you Erick - looking through all the logs on the nodes I found this:

INFO  [CompactionExecutor:17551] 2021-09-15 15:13:20,524 
CompactionTask.java:245 - Compacted 
(fb0cdca0-1658-11ec-9098-dd70c3a3487a) 4 sstables to 
[/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96619-big,] 
to level=0.  9.762MiB to 9.672MiB (~99% of original) in 3,873ms.  Read 
Throughput = 2.520MiB/s, Write Throughput = 2.497MiB/s, Row Throughput = 
~125,729/s.  255,171 total partitions merged to 251,458.  Partition 
merge counts were {1:247758, 2:3687, 3:13, }
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,524 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96618-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,525 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96575-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,526 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96607-big
INFO  [NonPeriodicTasks:1] 2021-09-15 15:13:20,532 SSTable.java:111 - 
Deleting sstable: 
/data/7/cassandra/data/doc/fieldcounts-03b67080ada111ebade9fdc1d34336d3/nb-96554-big
DEBUG [epollEventLoopGroup-5-85] 2021-09-15 15:13:20,642 
InitialConnectionHandler.java:121 - Response to STARTUP sent, 
configuring pipeline for 5/v5
DEBUG [epollEventLoopGroup-5-85] 2021-09-15 15:13:20,643 
InitialConnectionHandler.java:153 - Configured pipeline: 
DefaultChannelPipeline{(frameDecoder = 
org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = 
org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = 
org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler = 
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
INFO  [ScheduledTasks:1] 2021-09-15 15:13:21,976 
MessagingMetrics.java:206 - COUNTER_MUTATION_RSP messages were dropped 
in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 44285 ms


So - yes, nodes are dropping mutations.  I did find a node where one of 
the drives was pegged.  Fixed that - but it's still happening.  This 
happened after adding a relatively large node (.44) to the cluster:


nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host 
ID   Rack
UN  172.16.100.251  526.35 GiB  200 35.1% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.252  537.14 GiB  200 34.8% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  548.82 GiB  200 34.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   561.85 GiB  200 35.0% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   547.86 GiB  200 34.2% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.52 GiB   4   0.7% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  560.63 GiB  200 35.0% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.44   432.76 GiB  200 34.7% 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  172.16.100.37   331.31 GiB  120 20.5% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  501.62 GiB  200 35.3% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


At this point I'm not sure what's going on.  Some repairs have failed 
over the past few days.


-Joe

On 9/14/2021 7:23 PM, Erick Ramirez wrote:
The obvious conclusion is to say that the nodes can't keep up so it 
would be interesting to know how often you're issuing the counter 
updates. Also, how are the commit log disks performing on the nodes? 
If you have monitoring in place, check the IO stats/metrics. And 
finally, review the logs on the nodes to see if they are indeed 
dropping mutations. Cheers!


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

COUNTER timeout

2021-09-14 Thread Joe Obernberger

I'm getting a lot of the following errors during ingest of data:

com.datastax.oss.driver.api.core.servererrors.WriteTimeoutException: 
Cassandra timeout during COUNTER write query at consistency ONE (1 
replica were required but only 0 acknowledged the write)
    at 
com.datastax.oss.driver.api.core.servererrors.WriteTimeoutException.copy(WriteTimeoutException.java:96)
    at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
    at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
    at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)


The CQL being executed is:
"update doc.seq set doccount=doccount+? where id=?"

Table is:

CREATE TABLE doc.seq (
    id text PRIMARY KEY,
    doccount counter
) WITH additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

Total rows in the doc.seq table is 356.  What could cause this timeout 
error?

Thank you!

-Joe



Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
Oh!  Excellent!  Doh!  That was it.  So when we add a new system, we use 
puppet to push things out...like NTP...well this is our first Rocky 
Linux install and guess what I didn't do?

Thank you Song.  The new machine is now joining the cluster.

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host 
ID   Rack
UN  172.16.100.251  490.67 GiB  200 38.7% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.208  76.31 GiB   30  5.8% 
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.252  504.13 GiB  200 38.6% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  519.29 GiB  200 38.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   526.47 GiB  200 38.6% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   523.19 GiB  200 38.6% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.42 GiB   4   0.8% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  526.61 GiB  200 38.7% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
*UJ  172.16.100.44   179.98 KiB  200 ? 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1*
UN  172.16.100.37   315.89 GiB  120 23.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  465.48 GiB  200 38.6% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


Cheers!

-Joe

On 9/10/2021 1:25 PM, Bowen Song wrote:


Hello Joe,


These logs indicate the clocks are out of sync (by over 4.2 hours) 
between the new node and the seed nodes:


INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:36,594
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:18:42,653
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms

Can you please check that the NTP client is running on all servers and 
the clocks are in sync?



Cheers,

Bowen



On 10/09/2021 16:18, Joe Obernberger wrote:


Good idea.
There are two seed nodes:
I see this on one (note 172.16.100.44 is the new node):

DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully 
expired SSTables
INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810 
InboundConnectionInitiator.java:464 - 
/172.16.100.44:7000(/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82 
messaging connection established, version = 12, framing = LZ4, 
encryption = unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped 
in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 
- Pool Name Active   Pending  Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- ReadStage 0 0    4729810 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- CompactionExecutor 0 0 384171 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- MutationStage 0 0   14540487 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- MemtableReclaimMemory 0 0    316 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- PendingRangeCalculator 0 0 11 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- GossipStage 0 0    1126031 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- SecondaryIndexManagement 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- HintsDispatcher 0 0 15 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- Native-Transport-Requests 0 0   13286230 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- RequestResponseStage 0 0   15724485 
0   

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
ory 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:36,599 StatusLogger.java:119 - 
system_distributed.view_build_status 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:36,599 StatusLogger.java:119 - 
reaper_db.node_operations 0,0

.
.
.

and this on the other:

DEBUG [epollEventLoopGroup-5-77] 2021-09-10 11:18:35,174 
InitialConnectionHandler.java:153 - Configured pipeline: 
DefaultChannelPipeline{(frameDecoder = 
org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = 
org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = 
org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler = 
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
WARN  [epollEventLoopGroup-5-77] 2021-09-10 11:18:35,180 
ExceptionHandlers.java:104 - Unknown exception in client networking
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer
DEBUG [epollEventLoopGroup-5-78] 2021-09-10 11:18:36,275 
InitialConnectionHandler.java:121 - Response to STARTUP sent, 
configuring pipeline for 5/v5
DEBUG [epollEventLoopGroup-5-78] 2021-09-10 11:18:36,276 
InitialConnectionHandler.java:153 - Configured pipeline: 
DefaultChannelPipeline{(frameDecoder = 
org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = 
org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = 
org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler = 
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
WARN  [epollEventLoopGroup-5-78] 2021-09-10 11:18:36,278 
ExceptionHandlers.java:104 - Unknown exception in client networking
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection reset by peer
INFO  [Messaging-EventLoop-3-39] 2021-09-10 11:18:41,326 
InboundConnectionInitiator.java:464 - 
/172.16.100.44:7000(/172.16.100.44:44368)->/172.16.100.37:7000-URGENT_MESSAGES-8aa82849 
messaging connection established, version = 12, framing = LZ4, 
encryption = unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 11:18:42,653 
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped in 
last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 15137813 ms

.
.
.

-Joe


On 9/10/2021 11:06 AM, Sam Tunnicliffe wrote:
Is there anything in the logs of the other nodes, particularly those 
in the seeds list of the one which won't start? If this is a problem 
with the peers being unable to respond, as the issue fixed in 4.0.1 
(CASSANDRA-16877) was, you may see some indication there.


Sam

On 10 Sep 2021, at 15:56, Joe Obernberger 
 wrote:


Thank you Jeff - yes, this is on the latest 4.0.1

nodetool version
ReleaseVersion: 4.0.1
nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host ID Rack
UN  172.16.100.251  488.38 GiB  200 38.7% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.208  76.02 GiB   30 5.8% 
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.252  501.88 GiB  200 38.6% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  517.27 GiB  200 38.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   524.45 GiB  200 38.6% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   521.05 GiB  200 38.6% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.39 GiB   4 0.8% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  524.46 GiB  200 38.7% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   314.67 GiB  120 23.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  464.23 GiB  200 38.6% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


yum list installed | grep cass
cassandra.noarch 4.0.1-1   @cassandra

-Joe

On 9/10/2021 10:54 AM, Jeff Jirsa wrote:
Is this on 4.0.0 ? 4.0.1 fixes an issue where the gossip result is 
too large for the urgent message queue, causing this stack trace, 
and was released 3 days ago. I've never seen it on a 10 node cluster 
before, but I'd be trying that.


On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger 
 wrote:


I have a 10 node cluster and am trying to add another node. The new
node is running Rocky Linux and I'm getting the unable to gossip
with
any peers error.  Firewall and SELinux are off.  I can ping all the
other nodes OK.  I've checked everything I can think of
(/etc/hosts,
listen_address, broadcast etc..).  It all looks correct to me.
Any ideas?  Could it be an incompatibility with Rocky?

DEBUG [main] 2021-09-10 06:45:24,846
YamlConfigurationLoader.java:112 -
Loading settings from
file:/etc/cassandra/default.conf/cassandra.yaml
INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921
OutboundConnection.java:1150 -
/172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.
<http://172.16.0.1

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger

Thank you Jeff - yes, this is on the latest 4.0.1

nodetool version
ReleaseVersion: 4.0.1
nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host 
ID   Rack
UN  172.16.100.251  488.38 GiB  200 38.7% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.208  76.02 GiB   30  5.8% 
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.252  501.88 GiB  200 38.6% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  517.27 GiB  200 38.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   524.45 GiB  200 38.6% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   521.05 GiB  200 38.6% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.39 GiB   4   0.8% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  524.46 GiB  200 38.7% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   314.67 GiB  120 23.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  464.23 GiB  200 38.6% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


yum list installed | grep cass
cassandra.noarch 4.0.1-1   @cassandra

-Joe

On 9/10/2021 10:54 AM, Jeff Jirsa wrote:
Is this on 4.0.0 ? 4.0.1 fixes an issue where the gossip result is too 
large for the urgent message queue, causing this stack trace, and was 
released 3 days ago. I've never seen it on a 10 node cluster before, 
but I'd be trying that.


On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger 
 wrote:


I have a 10 node cluster and am trying to add another node.  The new
node is running Rocky Linux and I'm getting the unable to gossip with
any peers error.  Firewall and SELinux are off.  I can ping all the
other nodes OK.  I've checked everything I can think of (/etc/hosts,
listen_address, broadcast etc..).  It all looks correct to me.
Any ideas?  Could it be an incompatibility with Rocky?

DEBUG [main] 2021-09-10 06:45:24,846
YamlConfigurationLoader.java:112 -
Loading settings from file:/etc/cassandra/default.conf/cassandra.yaml
INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921
OutboundConnection.java:1150 -
/172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.
<http://172.16.100.>253:7000-URGENT_MESSAGES-90efbb9e
successfully connected, version = 12, framing = LZ4, encryption =
unencrypted
INFO  [Messaging-EventLoop-3-3] 2021-09-10 06:45:24,930
OutboundConnection.java:1150 -
/172.16.100.44:7000(/172.16.100.44:44320)->/172.16.100.37
<http://172.16.100.37>:7000-URGENT_MESSAGES-eae47864
successfully connected, version = 12, framing = LZ4, encryption =
unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 06:45:27,648
TokenMetadata.java:525
- Updating topology for all endpoints that have changed
DEBUG [OptionalTasks:1] 2021-09-10 06:45:54,644
SizeEstimatesRecorder.java:65 - Node is not part of the ring; not
recording size estimates
ERROR [main] 2021-09-10 06:46:25,891 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
 at
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1805)
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:648)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
DEBUG [StorageServiceShutdownHook] 2021-09-10 06:46:25,896
StorageService.java:1621 - DRAINING: starting drain process
INFO  [StorageServiceShutdownHook] 2021-09-10 06:46:25,898
HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-09-10 06:46:25,899
Gossiper.java:1993 - No local state, state is in silent shutdown, or
node hasn't joined, not announcing shutdown

Thank you!

-Joe


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Unable to Gossip

2021-09-10 Thread Joe Obernberger
I have a 10 node cluster and am trying to add another node.  The new 
node is running Rocky Linux and I'm getting the unable to gossip with 
any peers error.  Firewall and SELinux are off.  I can ping all the 
other nodes OK.  I've checked everything I can think of (/etc/hosts, 
listen_address, broadcast etc..).  It all looks correct to me.

Any ideas?  Could it be an incompatibility with Rocky?

DEBUG [main] 2021-09-10 06:45:24,846 YamlConfigurationLoader.java:112 - 
Loading settings from file:/etc/cassandra/default.conf/cassandra.yaml
INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921 
OutboundConnection.java:1150 - 
/172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.253:7000-URGENT_MESSAGES-90efbb9e 
successfully connected, version = 12, framing = LZ4, encryption = 
unencrypted
INFO  [Messaging-EventLoop-3-3] 2021-09-10 06:45:24,930 
OutboundConnection.java:1150 - 
/172.16.100.44:7000(/172.16.100.44:44320)->/172.16.100.37:7000-URGENT_MESSAGES-eae47864 
successfully connected, version = 12, framing = LZ4, encryption = 
unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 06:45:27,648 TokenMetadata.java:525 
- Updating topology for all endpoints that have changed
DEBUG [OptionalTasks:1] 2021-09-10 06:45:54,644 
SizeEstimatesRecorder.java:65 - Node is not part of the ring; not 
recording size estimates
ERROR [main] 2021-09-10 06:46:25,891 CassandraDaemon.java:909 - 
Exception encountered during startup

java.lang.RuntimeException: Unable to gossip with any peers
    at 
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1805)
    at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:648)
    at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
    at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
    at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
    at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
    at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
DEBUG [StorageServiceShutdownHook] 2021-09-10 06:46:25,896 
StorageService.java:1621 - DRAINING: starting drain process
INFO  [StorageServiceShutdownHook] 2021-09-10 06:46:25,898 
HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-09-10 06:46:25,899 
Gossiper.java:1993 - No local state, state is in silent shutdown, or 
node hasn't joined, not announcing shutdown


Thank you!

-Joe



Re: New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
Thank you Max.  That is a solid choice.  You can even configure each 
blade with two 15TBytes SSDs (may not be wise), but that would yield 
~430TBytes of SSD across 14 nodes in 4u space for around $150k.


-Joe

On 8/2/2021 4:29 PM, Max C. wrote:
Have you considered a blade chassis?  Then you can get most of the 
redundancy of having lots of small nodes in few(er) rack units.


SuperMicro has a chassis that can accommodate 14 servers in 4U:

https://www.supermicro.com/en/products/superblade/enclosure#4U

- Max

On Aug 2, 2021, at 12:05 pm, Joe Obernberger 
 wrote:


Thank you Jeff.  Consider that if rack space is at a premium, what 
would make the most sense?


-Joe

On 8/2/2021 2:46 PM, Jeff Jirsa wrote:
IF you bought a server with that topology, you would definitely want 
to run lots of instances, perhaps 24, to effectively utilize that 
disk space.


You'd also need 24 IPs, and you'd need a NIC that could send/receive 
24x the normal bandwidth. And the cost of rebuilding such a node 
would be 24x higher than normal (so consider how many of those you'd 
have in a cluster, and how often they'd fail).




On Mon, Aug 2, 2021 at 11:06 AM Joe Obernberger 
 wrote:


We have a large amount of data to be stored in Cassandra, and if
we were
to purchase new hardware in limited space, what would make the
most sense?
Dell has machines with 24, 8TByte drives in a 2u configuration.
Given
Cassandra's limitations (?) to large nodes, would it make sense
to run
24 copies of Cassandra on that one node (one per drive)?
Thank you!

-Joe


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 






Re: New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
Thank you Jeff.  Consider that if rack space is at a premium, what would 
make the most sense?


-Joe

On 8/2/2021 2:46 PM, Jeff Jirsa wrote:
IF you bought a server with that topology, you would definitely want 
to run lots of instances, perhaps 24, to effectively utilize that disk 
space.


You'd also need 24 IPs, and you'd need a NIC that could send/receive 
24x the normal bandwidth. And the cost of rebuilding such a node 
would be 24x higher than normal (so consider how many of those you'd 
have in a cluster, and how often they'd fail).




On Mon, Aug 2, 2021 at 11:06 AM Joe Obernberger 
 wrote:


We have a large amount of data to be stored in Cassandra, and if
we were
to purchase new hardware in limited space, what would make the
most sense?
Dell has machines with 24, 8TByte drives in a 2u configuration. Given
Cassandra's limitations (?) to large nodes, would it make sense to
run
24 copies of Cassandra on that one node (one per drive)?
Thank you!

-Joe


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

New Servers - Cassandra 4

2021-08-02 Thread Joe Obernberger
We have a large amount of data to be stored in Cassandra, and if we were 
to purchase new hardware in limited space, what would make the most sense?
Dell has machines with 24, 8TByte drives in a 2u configuration. Given 
Cassandra's limitations (?) to large nodes, would it make sense to run 
24 copies of Cassandra on that one node (one per drive)?

Thank you!

-Joe



Re: [RELEASE] Apache Cassandra 4.0.0 released

2021-07-26 Thread Joe Obernberger

Whoo hoo!  Looking forward to trying it out!

-Joe

On 7/26/2021 4:03 PM, Brandon Williams wrote:

The Cassandra team is pleased to announce the release of Apache
Cassandra version 4.0.0.

Apache Cassandra is a fully distributed database. It is the right
choice when you need scalability and high availability without
compromising performance.

http://cassandra.apache.org/

Downloads of source and binary distributions are available in our
download section:

http://cassandra.apache.org/download/

This version is the initial release in the 4.0 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.0
[2]: NEWS.txt 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.0
[3]: https://issues.apache.org/jira/browse/CASSANDRA



Re: [RELEASE] Apache Cassandra 4.0-rc2 released

2021-06-30 Thread Joe Obernberger

Downloading now!  Thank you!  yum update cassandra!

-Joe

On 6/30/2021 3:55 PM, Patrick McFadin wrote:
Congrats to everyone that worked on this iteration. If you haven't 
looked at the CHANGES.txt there were some great catches in RC1. Just 
like it should happen!


On Wed, Jun 30, 2021 at 12:29 PM Mick Semb Wever  wrote:


The Cassandra team is pleased to announce the release of Apache
Cassandra version 4.0-rc2.

Apache Cassandra is a fully distributed database. It is the right
choice when you need scalability and high availability without
compromising performance.
http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our
download section:
http://cassandra.apache.org/download/

This version is a release candidate[1] on the 4.0 series. As
always, please pay attention to the release notes[2] and let us
know[3] if you were to encounter any problem.

Please note, the bintray location is now replaced with the ASF's
JFrog Artifactory location:
https://apache.jfrog.io/artifactory/cassandra/

Enjoy!

[1]: CHANGES.txt

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0-rc2
[2]: NEWS.txt

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0-rc2
[3]: https://issues.apache.org/jira/browse/CASSANDRA


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Which open source or free tool do you use to monitor cassandra clusters?

2021-06-16 Thread Joe Obernberger
I've been using Grafana+Prometheus and the 
jmx_prometheus_javaagent-0.15.0.jar agent on the cassandra cluster.

Then use CassandraReaper for scheduled repairs.

Used this guide:

https://www.cloudwalker.io/2020/05/17/monitoring-cassandra-with-prometheus/

-Joe

On 6/16/2021 11:21 AM, Surbhi Gupta wrote:

Hi,

Which open source or free tool do you use to monitor cassandra 
clusters which have similar features like Opscenter?


Thanks
Surbhi


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: multiple clients making schema changes at once

2021-06-03 Thread Joe Obernberger
How does this work?  I have a program that runs a series of alter table 
statements, and then does inserts.  In some cases, the insert happens 
immediately after the alter table statement and the insert fails because 
the schema (apparently) has not had time to propagate.  I get an 
Undefined column name error.


The alter statements run single threaded, but the inserts run in 
multiple threads.  The alter statement is run in a synchronized block 
(Java).  Should I put an artificial delay after the alter statement?


-Joe

On 6/1/2021 2:59 PM, Max C. wrote:
We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python 
client library for ZooKeeper.


- Max

Yes this is quite annoying. How did you implement that "external 
lock"? I also thought of doing an external service that would be 
dedicated to that. Cassandra client apps would send create 
instruction to that service, that would receive them and do the 
creates 1 by 1, and the client app would wait the response from it 
before starting to insert.


Best,

Sébastien.

Le mar. 1 juin 2021 à 05:21, Max C.  a 
écrit :


In our case we have a shared dev cluster with (for example) a key
space for each developer, a key space for each CI runner, etc.  
As part of initializing our test suite we setup the schema to
match the code that is about to be tested.� This can mean
multiple CI runners each adding/dropping tables at the same time
but for different key spaces.

Our experience is even though the schema changes do not conflict,
we still run into schema mismatch problems.   Our solution to
this was to have a lock (external to Cassandra) that ensures only
a single schema change operation is being issued at a time.

People assume schema changes in Cassandra work the same way as
MySQL or multiple users editing files on disk — i.e. as long as
you’re not editing the same file (or same MySQL table), then
there’s no problem. � *_This is NOT the case._*  Cassandra
schema changes are more like “git push”ing a commit to the
same branch — i.e. at most one change can be outstanding at a
time (across all tables, all key spaces)…otherwise you will run
into trouble.

Hope that helps.  Best of luck.

- Max

Hello,

I have a more general question about that, I cannot find
clear answer.

In my use case I have many tables (around 10k new tables
created per months) and they are created from many clients
and only dynamically, with several clients creating same
tables simulteanously.

What is the recommended way of creating tables dynamically?
If I am doing "if not exists" queries + wait for schema
aggreement before and after each create statement, will it
work correctly for Cassandra?

Sébastien.





 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Datastax error - failed to allocate direct memory

2021-05-24 Thread Joe Obernberger
Please dis-regard - this appears to be a netty issue not a 
datastax/cassandra issue.  My apologies!


-joe

On 5/24/2021 11:05 AM, Joe Obernberger wrote:
I'm getting the following error using 4.0RC1.  I've increased direct 
memory to 1g with:  -XX:MaxDirectMemorySize=1024m
The error comes from an execute statement on a static 
PreparedStatement.  It runs fine for a while, and then dies.

Any ideas?

2021-05-24 11:03:10,342 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] 
(executor-thread-32) HTTP Request to /index failed, error id: 
1657b377-70c4-42d3-8f85-b25b3f6a538e-1: 
org.jboss.resteasy.spi.UnhandledException: 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException: 
Unexpected error on channel
    at 
org.jboss.resteasy.core.ExceptionHandler.handleApplicationException(ExceptionHandler.java:106)
    at 
org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:372)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:218)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:519)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:261)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:161)
    at 
org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:164)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:247)
    at 
io.quarkus.resteasy.runtime.standalone.RequestDispatcher.service(RequestDispatcher.java:73)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.dispatch(VertxRequestHandler.java:138)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.access$000(VertxRequestHandler.java:41)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler$1.run(VertxRequestHandler.java:93)
    at 
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2415)
    at 
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1452)
    at 
org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at 
org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)

    at java.lang.Thread.run(Thread.java:748)
    at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException: 
Unexpected error on channel
    at 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException.copy(ClosedConnectionException.java:51)
    at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
    at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
    at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
    at 
com.ngc.helios.heliosindexerservice.IndexService.index(IndexService.java:44)
    at 
com.ngc.helios.heliosindexerservice.IndexService_ClientProxy.index(IndexService_ClientProxy.zig:157)
    at 
com.ngc.helios.heliosindexerservice.IndexResource.index(IndexResource.java:26)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown 
Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:170)
    at 
org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:130)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:643)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:507)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(ResourceMethodInvoker.java:457)
    at 
org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:459)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:419)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:393)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:68

Datastax error - failed to allocate direct memory

2021-05-24 Thread Joe Obernberger
I'm getting the following error using 4.0RC1.  I've increased direct 
memory to 1g with:  -XX:MaxDirectMemorySize=1024m
The error comes from an execute statement on a static 
PreparedStatement.  It runs fine for a while, and then dies.

Any ideas?

2021-05-24 11:03:10,342 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] 
(executor-thread-32) HTTP Request to /index failed, error id: 
1657b377-70c4-42d3-8f85-b25b3f6a538e-1: 
org.jboss.resteasy.spi.UnhandledException: 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException: 
Unexpected error on channel
    at 
org.jboss.resteasy.core.ExceptionHandler.handleApplicationException(ExceptionHandler.java:106)
    at 
org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:372)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:218)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:519)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:261)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:161)
    at 
org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:164)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:247)
    at 
io.quarkus.resteasy.runtime.standalone.RequestDispatcher.service(RequestDispatcher.java:73)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.dispatch(VertxRequestHandler.java:138)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.access$000(VertxRequestHandler.java:41)
    at 
io.quarkus.resteasy.runtime.standalone.VertxRequestHandler$1.run(VertxRequestHandler.java:93)
    at 
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2415)
    at 
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1452)
    at 
org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
    at 
org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)

    at java.lang.Thread.run(Thread.java:748)
    at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException: 
Unexpected error on channel
    at 
com.datastax.oss.driver.api.core.connection.ClosedConnectionException.copy(ClosedConnectionException.java:51)
    at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
    at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
    at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
    at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
    at 
com.ngc.helios.heliosindexerservice.IndexService.index(IndexService.java:44)
    at 
com.ngc.helios.heliosindexerservice.IndexService_ClientProxy.index(IndexService_ClientProxy.zig:157)
    at 
com.ngc.helios.heliosindexerservice.IndexResource.index(IndexResource.java:26)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown 
Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:170)
    at 
org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:130)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:643)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:507)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(ResourceMethodInvoker.java:457)
    at 
org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:364)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:459)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:419)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:393)
    at 
org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:68)
    at 
org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:492)

    ... 15 more
Caused by: 

Re: RC1 - joining cluster

2021-05-12 Thread Joe Obernberger
  at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:829)
DEBUG [Stream-Deserializer-/172.16.100.248:7000-4810ecd6] 2021-05-12 
15:14:41,824 StreamSession.java:638 - [Stream 
#0fb4f950-b356-11eb-85f3-15cea6735fa9] Socket closed after session 
completed with state COMPLETE


At this point, I plan on setting auto_bootstrap to false?

-Joe

On 5/10/2021 8:17 PM, Kane Wilson wrote:
Well, that sounds like a dangerous sequence of events, but should have 
worked in the end regardless. Probably next time give it a bit more 
time and keep an eye on netstats and compactionstats.



raft.so <https://raft.so> - Cassandra consulting, support, and 
managed services



On Mon, May 10, 2021 at 10:23 PM Joe Obernberger 
 wrote:


Hi - I waited 3 hours.  It was syncing up data; I could see
network traffic, but then it stopped.  I didn't check netstats,
but I did check compactionstats and there were no pending tasks. 
I then set auto_bootstrap to false on both new machines and they
joined.  Then ran a repair.

-Joe

On 5/9/2021 7:12 PM, Kane Wilson wrote:

How long are you waiting for the node to join? Have you checked
nodetool netstats and compactionstats to see if all
streams/compactions are complete?

raft.so <https://raft.so> - Cassandra consulting, support, and
managed services


On Sat, May 8, 2021 at 11:23 AM Joe Obernberger
 wrote:

Whoops - had it in the wrong datacenter.  Same issue - new
node is
stuck in UJ, but I can start/stop OK with systemctl.

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                   
Load      �
Tokens  Owns (effective)  Host
ID                              
Rack
UN� helene.querymasters.com
<http://helene.querymasters.com>  � 423.92 MiB  30    Â
18.6%           �
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UJ� fortuna.querymasters.com
<http://fortuna.querymasters.com> � 1.75 GiB    200   Â
?               �
49e4f571-7d1c-4e1e-aca7-5bbe076596f7�
rack1
UN� charon.querymasters.com
<http://charon.querymasters.com>  � 2.22 GiB    200   Â
98.5%           �
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN� eros.querymasters.com
<http://eros.querymasters.com>    � 2.21 GiB   
200   Â
98.5%           �
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN� hercules.querymasters.com
<http://hercules.querymasters.com>� 58.65 MiB   4     Â
2.6%            �
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN� chaos.querymasters.com
<http://chaos.querymasters.com>   � 1.82 GiB   
120   Â
81.8%           �
08a19658-40be-4e55-8709-812b3d4ac750  rack1

I am able to restart the server (fortuna - after about 3
hours), but I
then get this:

ERROR [Stream-Deserializer-/172.16.100.253:7000-493728e3]
2021-05-07
21:17:35,805 StreamingInboundHandler.java:205 - [Stream channel:
493728e3] stream operation from /172.16.100.253:7000
<http://172.16.100.253:7000> failed
java.lang.IllegalStateException: unknown stream session:
27c00760-af9b-11eb-b7ee-5d6a136b5405 - 0
        at

org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:45)
        at

org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
        at

org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
        at

org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:172)
        at

io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
ERROR [Stream-Deserializer-/172.16.100.253:7000-e313e37d]
2021-05-07
21:17:36,208 StreamSession.java:882 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405] Remote peer
/172.16.100.253:7000 <http://172.16.100.253:7000>
failed stream session.
INFO� [Stream-De

Re: Counter errors - RC1

2021-05-11 Thread Joe Obernberger

One of the nodes was swapping in this case; fixed that - problem solved.
Yes - the machines are varying sizes and I wanted to test to see how 
well a cluster would work in such a configuration.


-Joe

On 5/10/2021 8:14 PM, Kane Wilson wrote:
Seems like some of your nodes are overloaded. Is it intentional that 
some of your nodes have varying numbers of tokens?


It seems like some of your nodes are overloaded, potentially at least 
#RF of them. If nodes are heavily overloaded GC tuning generally won't 
help much, you're best off starting by reducing load or increasing 
capacity.


raft.so <https://raft.so> - Cassandra consulting, support, and 
managed services



On Tue, May 11, 2021 at 7:44 AM Joe Obernberger 
 wrote:


Hi all - I'm getting the following error on RC1:

WARN  [Messaging-EventLoop-3-23] 2021-05-10 17:29:12,431
NoSpamLogger.java:95 -
/172.16.100.39:7000->/172.16.100.248:7000-URGENT_MESSAGES-e8d21588
dropping message of type FAILURE_RSP whose timeout expired before
reaching the network
ERROR [CounterMutationStage-62] 2021-05-10 17:29:12,431
AbstractLocalAwareExecutorService.java:166 - Uncaught exception on
thread Thread[CounterMutationStage-62,5,main]
java.lang.RuntimeException:
org.apache.cassandra.exceptions.WriteTimeoutException: Operation
timed
out - received only 0 responses.
    at

org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2278)
    at

java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at

org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
    at

org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
    at
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
    at

io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException:
Operation timed out - received only 0 responses.
    at

org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:162)
    at

org.apache.cassandra.db.CounterMutation.applyCounterMutation(CounterMutation.java:131)
    at

org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:1678)
    at

org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2274)
    ... 6 common frames omitted

This happens under load.

I'm also seeing a lot of these messages:

WARN  [GossipTasks:1] 2021-05-10 17:30:20,969
FailureDetector.java:319
- Not marking nodes down due to local pause of 5785753812ns >
50ns
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969
FailureDetector.java:325 -
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969
FailureDetector.java:325 -
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969
FailureDetector.java:325 -
Still not marking nodes down due to local pause

The other messages are slow queries like:
SELECT mediatype, origvalue FROM doc.origdoc WHERE uuid =
DS_5_2021-05-08T06-53-41.442Z_Hi0ywdNE LIMIT 1>, time 1370 msec -
slow
timeout 500 msec

I've tried switching the G1 garbage collector (java 11), and that did
reduce these times (was seeing over 5000msec).  The above select
statement is on a table where uuid is the primary key.

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns
(effective)  Host
ID   Rack
UN  172.16.100.208  9.16 GiB   30�
9.3%Â
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.249  60.69 GiB  200   �
62.9%   Â
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   61.16 GiB  200   �
62.9%   Â
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   61.07 GiB  200   �
63.0%   Â
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  1.24 GiB   4 �
1.3%Â
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  60.35 GiB  200   �
62.9%   Â
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   37.18 GiB  120   �
37.7%   Â
08a19658-40be-4e55-8709-812b3d4ac750  rack1

nodetool tablestats doc.origdoc
Total number of tables: 74

Keyspace : doc
    Read Count: 37511
    Read Latency: 33.929465116899046 ms

Counter errors - RC1

2021-05-10 Thread Joe Obernberger

Hi all - I'm getting the following error on RC1:

WARN  [Messaging-EventLoop-3-23] 2021-05-10 17:29:12,431 
NoSpamLogger.java:95 - 
/172.16.100.39:7000->/172.16.100.248:7000-URGENT_MESSAGES-e8d21588 
dropping message of type FAILURE_RSP whose timeout expired before 
reaching the network
ERROR [CounterMutationStage-62] 2021-05-10 17:29:12,431 
AbstractLocalAwareExecutorService.java:166 - Uncaught exception on 
thread Thread[CounterMutationStage-62,5,main]
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed 
out - received only 0 responses.
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2278)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
        at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: 
Operation timed out - received only 0 responses.
        at 
org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:162)
        at 
org.apache.cassandra.db.CounterMutation.applyCounterMutation(CounterMutation.java:131)
        at 
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:1678)
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2274)

        ... 6 common frames omitted

This happens under load.

I'm also seeing a lot of these messages:

WARN  [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:319 
- Not marking nodes down due to local pause of 5785753812ns > 50ns
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 - 
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 - 
Still not marking nodes down due to local pause
DEBUG [GossipTasks:1] 2021-05-10 17:30:20,969 FailureDetector.java:325 - 
Still not marking nodes down due to local pause


The other messages are slow queries like:
SELECT mediatype, origvalue FROM doc.origdoc WHERE uuid = 
DS_5_2021-05-08T06-53-41.442Z_Hi0ywdNE LIMIT 1>, time 1370 msec - slow 
timeout 500 msec


I've tried switching the G1 garbage collector (java 11), and that did 
reduce these times (was seeing over 5000msec).  The above select 
statement is on a table where uuid is the primary key.


Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns 
(effective)  Host 
ID                               Rack
UN  172.16.100.208  9.16 GiB   30      
9.3%             2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.249  60.69 GiB  200     
62.9%            49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   61.16 GiB  200     
62.9%            d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   61.07 GiB  200     
63.0%            93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  1.24 GiB   4       
1.3%             a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  60.35 GiB  200     
62.9%            4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   37.18 GiB  120     
37.7%            08a19658-40be-4e55-8709-812b3d4ac750  rack1


nodetool tablestats doc.origdoc
Total number of tables: 74

Keyspace : doc
        Read Count: 37511
        Read Latency: 33.929465116899046 ms
        Write Count: 4604965
        Write Latency: 0.20405303102195133 ms
        Pending Flushes: 0
                Table: origdoc
                SSTable count: 85
                Old SSTable count: 0
                Space used (live): 54635707180
                Space used (total): 54635707180
                Space used by snapshots (total): 0
                Off heap memory used (total): 258773554
                SSTable Compression Ratio: 
0.33099344385825985

                Number of partitions (estimate): 114982637
                Memtable cell count: 0
                Memtable data size: 0
       

Re: RC1 - joining cluster

2021-05-10 Thread Joe Obernberger
Hi - I waited 3 hours.  It was syncing up data; I could see network 
traffic, but then it stopped.  I didn't check netstats, but I did check 
compactionstats and there were no pending tasks. I then set 
auto_bootstrap to false on both new machines and they joined.  Then ran 
a repair.


-Joe

On 5/9/2021 7:12 PM, Kane Wilson wrote:
How long are you waiting for the node to join? Have you checked 
nodetool netstats and compactionstats to see if all 
streams/compactions are complete?


raft.so <https://raft.so> - Cassandra consulting, support, and 
managed services



On Sat, May 8, 2021 at 11:23 AM Joe Obernberger 
 wrote:


Whoops - had it in the wrong datacenter.  Same issue - new node is
stuck in UJ, but I can start/stop OK with systemctl.

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                    Load      �
Tokens  Owns (effective)  Host
ID                               Rack
UN� helene.querymasters.com <http://helene.querymasters.com>  �
423.92 MiB  30    Â
18.6%            
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3� rack1
UJ� fortuna.querymasters.com <http://fortuna.querymasters.com> �
1.75 GiB    200   Â
?                
49e4f571-7d1c-4e1e-aca7-5bbe076596f7�
rack1
UN� charon.querymasters.com <http://charon.querymasters.com>  �
2.22 GiB    200   Â
98.5%            
d9702f96-256e-45ae-8e12-69a42712be50� rack1
UN� eros.querymasters.com <http://eros.querymasters.com>    �
2.21 GiB    200   Â
98.5%            
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47� rack1
UN� hercules.querymasters.com <http://hercules.querymasters.com>�
58.65 MiB   4     Â
2.6%             
a1a16910-9167-4174-b34b-eb859d36347e� rack1
UN� chaos.querymasters.com <http://chaos.querymasters.com>   �
1.82 GiB    120   Â
81.8%            
08a19658-40be-4e55-8709-812b3d4ac750� rack1

I am able to restart the server (fortuna - after about 3 hours),
but I
then get this:

ERROR [Stream-Deserializer-/172.16.100.253:7000-493728e3] 2021-05-07
21:17:35,805 StreamingInboundHandler.java:205 - [Stream channel:
493728e3] stream operation from /172.16.100.253:7000
<http://172.16.100.253:7000> failed
java.lang.IllegalStateException: unknown stream session:
27c00760-af9b-11eb-b7ee-5d6a136b5405 - 0
        at

org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:45)
        at

org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
        at

org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
        at

org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:172)
        at

io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
ERROR [Stream-Deserializer-/172.16.100.253:7000-e313e37d] 2021-05-07
21:17:36,208 StreamSession.java:882 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405] Remote peer
/172.16.100.253:7000 <http://172.16.100.253:7000>
failed stream session.
INFO  [Stream-Deserializer-/172.16.100.253:7000-e313e37d] 2021-05-07
21:17:36,209 StreamResultFuture.java:192 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405] Session with
/172.16.100.253:7000 <http://172.16.100.253:7000>
is complete
INFO  [Stream-Deserializer-/172.16.100.253:7000-e313e37d] 2021-05-07
21:17:36,209 StreamSession.java:359 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405] Starting streaming to
/172.16.100.37:7000 <http://172.16.100.37:7000>
INFO  [Stream-Deserializer-/172.16.100.253:7000-e313e37d] 2021-05-07
21:17:36,214 StreamCoordinator.java:263 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405, ID#0] Beginning stream session
with /172.16.100.37:7000 <http://172.16.100.37:7000>
INFO  [Stream-Deserializer-/172.16.100.36:7000-9d343b7e] 2021-05-07
21:17:37,808 StreamResultFuture.java:178 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405 ID#0] Prepare completed.
Receiving
0 files(0.000KiB), sending 0 files(0.000KiB)
INFO  [Stream-Deserializer-/172.16.100.39:7000-1c5eddba] 2021-05-07
21:17:37,809 StreamResultFuture.java:178 - [Stream
#27c00760-af9b-11eb-b7ee-5d6a136b5405 ID#0] Prepare completed.
Receiving
0 files(0.000KiB), sending 0 files(0.000KiB)
INFO  [Strea

Re: RC1 - joining cluster

2021-05-07 Thread Joe Obernberger
:552)
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:533)
        at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1766)
        at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1054)
        at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1015)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)

Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at 
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
        at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
        at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
        at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
        at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
        at 
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:220)
        at 
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:196)
        at 
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:506)
        at 
org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:837)
        at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:596)
        at 
org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:189)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:829)
WARN  [main] 2021-05-07 21:17:41,843 StorageService.java:1090 - Some 
data streaming failed. Use nodetool to check bootstrap state and resume. 
For more, see `nodetool help bootstrap`. IN_PROGRESS


-Joe

On 5/7/2021 5:37 PM, Joe Obernberger wrote:
When I try to halt the joining node with systemctl stop cassandra, it 
hangs.  I don't see it doing any network, disk, or CPU activity using 
tools like iotop, atop, and top.


I ended up kill -9'ing the process.  I tried the same join on a 
different machine, and the same issue occurs.  It hangs in UJ.  I 
deleted all data on the new node (not much there cuz it's new!), and 
tried again.  Same issue.


In other news, java 11 is working.  :)

-Joe


On 5/7/2021 5:07 PM, Joe Obernberger wrote:
Have an existing 5 node RC1 cluster and trying to join two more nodes 
to it.

The new node is stuck in the UJ status:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns 
(effective)  Host 
ID                               Rack
UN  172.16.100.208  410.12 MiB  30      
18.6%           � 2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  
rack1
UN  172.16.100.36   2.15 GiB    200     
98.5%           � d9702f96-256e-45ae-8e12-69a42712be50  
rack1
UN  172.16.100.39   2.14 GiB    200     
98.5%           � 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  
rack1
UN  172.16.100.253  56.97 MiB   4       
2.6%            � 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.37   1.77 GiB    120     
81.8%           � 08a19658-40be-4e55-8709-812b3d4ac750  
rack1


Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns 
(effective)  Host 
ID                               Rack
UJ  172.16.100.248  1.31 MiB    200     
?               � 
054109ad-3a5e-4680-b4ad-f9c08089238c  rack1


What can I check?

-Joe



Re: RC1 - joining cluster

2021-05-07 Thread Joe Obernberger
When I try to halt the joining node with systemctl stop cassandra, it 
hangs.  I don't see it doing any network, disk, or CPU activity using 
tools like iotop, atop, and top.


I ended up kill -9'ing the process.  I tried the same join on a 
different machine, and the same issue occurs.  It hangs in UJ.  I 
deleted all data on the new node (not much there cuz it's new!), and 
tried again.  Same issue.


In other news, java 11 is working.  :)

-Joe


On 5/7/2021 5:07 PM, Joe Obernberger wrote:
Have an existing 5 node RC1 cluster and trying to join two more nodes 
to it.

The new node is stuck in the UJ status:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns 
(effective)  Host 
ID   Rack
UN  172.16.100.208  410.12 MiB  30  
18.6%   � 2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  
rack1
UN  172.16.100.36   2.15 GiB    200 
98.5%   � d9702f96-256e-45ae-8e12-69a42712be50  
rack1
UN  172.16.100.39   2.14 GiB    200 
98.5%   � 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  
rack1
UN  172.16.100.253  56.97 MiB   4   
2.6%� a1a16910-9167-4174-b34b-eb859d36347e  
rack1
UN  172.16.100.37   1.77 GiB    120 
81.8%   � 08a19658-40be-4e55-8709-812b3d4ac750  
rack1


Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns 
(effective)  Host 
ID   Rack
UJ  172.16.100.248  1.31 MiB    200 
?   � 
054109ad-3a5e-4680-b4ad-f9c08089238c  rack1


What can I check?

-Joe



RC1 - joining cluster

2021-05-07 Thread Joe Obernberger

Have an existing 5 node RC1 cluster and trying to join two more nodes to it.
The new node is stuck in the UJ status:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns 
(effective)  Host 
ID                               Rack
UN  172.16.100.208  410.12 MiB  30      
18.6%            2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.36   2.15 GiB    200     
98.5%            d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   2.14 GiB    200     
98.5%            93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  56.97 MiB   4       
2.6%             a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.37   1.77 GiB    120     
81.8%            08a19658-40be-4e55-8709-812b3d4ac750  rack1


Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens  Owns 
(effective)  Host 
ID                               Rack
UJ  172.16.100.248  1.31 MiB    200     
?                054109ad-3a5e-4680-b4ad-f9c08089238c  
rack1


What can I check?

-Joe



Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger

My bad.  It's V4, not v4.  :)

Works fine with V4.  Spews errors on V5.

-Joe

On 5/7/2021 12:16 PM, Joe Obernberger wrote:


So I'm confused.
I get this on startup from the client:

2021-05-07 15:27:48,119 WARNÂ 
[com.dat.oss.dri.int.cor.poo.ChannelPool] (s1-admin-1) 
[s1|hercules/172.16.100.253:9042] Fatal error w
hile initializing pool, forcing the node down: 
com.datastax.oss.driver.api.core.UnsupportedProtocolVersionException: 
[hercules/172.1

6.100.253:9042] *Host does not support protocol version V5*
        at 
com.datastax.oss.driver.api.core.UnsupportedProtocolVersionException.forSingleAttempt(UnsupportedProtocolVersionException

.java:46)
        at 
com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.onResponse(ProtocolInitHandler.java:335)
        at 
com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.onResponse(ChannelHandlerRequest.java:94)
        at 
com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:834)

If I specify in the application.conf to use v4, I get:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:48)
        at 
io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:25)
Caused by: java.lang.IllegalArgumentException: Unknown protocol 
version name: v4


Complete application.conf:

datastax-java-driver {
  basic.request.timeout = 60 seconds
  basic.request.consistency = LOCAL_QUORUM
  basic.contact-points = ["hercules:9042", "chaos:9042"]
  basic.load-balancing-policy {
        local-datacenter = datacenter1
  }
  advanced.protocol {
    version = v4
  }
}

-Joe


On 5/7/2021 12:05 PM, Sam Tunnicliffe wrote:
That's a driver error using protocol V5, which is the default from 
4.0-rc1 but only recently added to the drivers. Can you try 
specifying protocol V4 with all the same parameters? Also, if it's at 
all possible (which it may not be, given the divergence between 
driver versions 3 & 4), could you try with protocol V5 and driver 
version 3.11.0?


Thanks,
Sam


On 7 May 2021, at 16:12, Joe Obernberger 
 w

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger

So I'm confused.
I get this on startup from the client:

2021-05-07 15:27:48,119 WARNÂ [com.dat.oss.dri.int.cor.poo.ChannelPool] 
(s1-admin-1) [s1|hercules/172.16.100.253:9042] Fatal error w
hile initializing pool, forcing the node down: 
com.datastax.oss.driver.api.core.UnsupportedProtocolVersionException: 
[hercules/172.1

6.100.253:9042] *Host does not support protocol version V5*
        at 
com.datastax.oss.driver.api.core.UnsupportedProtocolVersionException.forSingleAttempt(UnsupportedProtocolVersionException

.java:46)
        at 
com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.onResponse(ProtocolInitHandler.java:335)
        at 
com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.onResponse(ChannelHandlerRequest.java:94)
        at 
com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
        at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

        at java.base/java.lang.Thread.run(Thread.java:834)

If I specify in the application.conf to use v4, I get:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:48)
        at 
io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:25)
Caused by: java.lang.IllegalArgumentException: Unknown protocol version 
name: v4


Complete application.conf:

datastax-java-driver {
  basic.request.timeout = 60 seconds
  basic.request.consistency = LOCAL_QUORUM
  basic.contact-points = ["hercules:9042", "chaos:9042"]
  basic.load-balancing-policy {
        local-datacenter = datacenter1
  }
  advanced.protocol {
    version = v4
  }
}

-Joe


On 5/7/2021 12:05 PM, Sam Tunnicliffe wrote:
That's a driver error using protocol V5, which is the default from 
4.0-rc1 but only recently added to the drivers. Can you try specifying 
protocol V4 with all the same parameters? Also, if it's at all 
possible (which it may not be, given the divergence between driver 
versions 3 & 4), could you try with protocol V5 and driver version 
3.11.0?


Thanks,
Sam


On 7 May 2021, at 16:12, Joe Obernberger 
 wrote:


I can retry Java 11.

I am seeing this error a lot - still debugging, but I'll throw it out 
there - using 4.11.1 driv

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
: java.lang.NullPointerException
                at 
com.datastax.oss.protocol.internal.PrimitiveSizes.sizeOfShortBytes(PrimitiveSizes.java:59)
                at 
com.datastax.oss.protocol.internal.request.Execute$Codec.encodedSize(Execute.java:78)
                at 
com.datastax.oss.protocol.internal.FrameCodec.encodedBodySize(FrameCodec.java:272)
                at 
com.datastax.oss.protocol.internal.SegmentBuilder.addFrame(SegmentBuilder.java:75)
                at 
com.datastax.oss.driver.internal.core.protocol.FrameToSegmentEncoder.write(FrameToSegmentEncoder.java:56)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
                at 
io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:304)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
                at 
com.datastax.oss.driver.internal.core.channel.InFlightHandler.write(InFlightHandler.java:151)
                at 
com.datastax.oss.driver.internal.core.channel.InFlightHandler.write(InFlightHandler.java:108)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
                at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
                at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
                at 
io.netty.channel.DefaultChannelPipeline.write(DefaultChannelPipeline.java:1015)
                at 
io.netty.channel.AbstractChannel.write(AbstractChannel.java:289)
                at 
com.datastax.oss.driver.internal.core.channel.DefaultWriteCoalescer$Flusher.runOnEventLoop(DefaultWriteCoalescer.java:100)
                at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
                at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
                at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
                at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
                at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
                at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
                at 
java.base/java.lang.Thread.run(Thread.java:834)



On 5/7/2021 10:00 AM, Jeff Jirsa wrote:
Cassandra 4.0 should work fine with java 11, including zgc (though zgc 
in jdk11 isn't meant to be production ready).Â


The things I care most about:
- Much faster streaming, which you care about if you're not using 
EBS/Disaggregated storage
- Virtual tables that make observability much more consistent (less 
JMX, more CQL)

- Incremental repair finally actually works (correctly)
- There's a bunch of new defensive rate limiters and hot-tunable 
properties in the database that people will enjoy once they need to 
use them

- JDK11�




On Fri, May 7, 2021 at 6:05 AM Joe Obernberger 
 wrote:


Hi Sean - I'm using RC1 now in a research environment on bare
metal.  The biggest drawback of Cassandra for me is that
Cassandra has issues working with modern large servers - a server
with >32TBytes of SSD seems to be a non-starter.

I tried running Cassandra with java 11, and that doesn't appear to
work.

-Joe

On 5/7/2021 8:47 AM, Durity, Sean R wrote:


There is not enough 4.0 chatter here. What feature or fix of the
4.0 release is most important for your use case(s)/environment?
What is working well so far? What needs more w

Re: 4.0 best feature/fix?

2021-05-07 Thread Joe Obernberger
Hi Sean - I'm using RC1 now in a research environment on bare metal.  
The biggest drawback of Cassandra for me is that Cassandra has issues 
working with modern large servers - a server with >32TBytes of SSD seems 
to be a non-starter.


I tried running Cassandra with java 11, and that doesn't appear to work.

-Joe

On 5/7/2021 8:47 AM, Durity, Sean R wrote:


There is not enough 4.0 chatter here. What feature or fix of the 4.0 
release is most important for your use case(s)/environment? What is 
working well so far? What needs more work? Is there anything that 
needs more explanation?


�



Sean Durity

Staff Systems Engineer – Cassandra

#cassandra - for the latest news and updates

�

�




The information in this Internet Email is confidential and may be 
legally privileged. It is intended solely for the addressee. Access to 
this Email by anyone else is unauthorized. If you are not the intended 
recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be 
unlawful. When addressed to our clients any opinions or advice 
contained in this Email are subject to the terms and conditions 
expressed in any applicable governing The Home Depot terms of business 
or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this 
attachment and for any damages or losses arising from any 
inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or 
other items of a destructive nature, which may be contained in this 
attachment and shall not be liable for direct, indirect, consequential 
or special damages in connection with this e-mail message or its 
attachment.


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: RC1 - Counters

2021-05-05 Thread Joe Obernberger
You are correct.  Thank you!  One of the machines in the cluster is a 
Centos 8 box that doesn't come with NTP - they only included 
chronyd...which doesn't seem to work with our existing NTP server.  
Ugh,  I finally gave up and just set the clock manually and it now works.

Cheers!

-Joe

On 5/5/2021 10:26 AM, Bowen Song wrote:
This sounds like the clock on your Cassandra servers are not in sync. 
Can you please ensure all Cassandra servers have their clock synced 
(usually via NTP) and retry this?



On 05/05/2021 14:42, Joe Obernberger wrote:

Want to add - I am seeing this in the log:
INFO  [ScheduledTasks:1] 2021-05-05 09:36:05,022 
MessagingMetrics.java:206 - COUNTER_MUTATION_RSP messages were 
dropped in last 5000 ms: 0 internal and 1 cross node. Mean internal 
dropped latency: 0 ms and Mean cross-node dropped latency: 21356 ms


-joe

On 5/5/2021 9:35 AM, Joe Obernberger wrote:

I'm seeing some odd behavior with RC1 and counters - from cqlsh:

cqlsh> select * from doc.seq;

 id   | doccount
--+--
   DS |    1
 DS_1 |  844

(2 rows)
cqlsh> update doc.seq set doccount=doccount+1 where id='DS_1';
OperationTimedOut: errors={'172.16.100.208:9042': 'Client request 
timeout. See Session.execute[_async](timeout)'}, 
last_host=172.16.100.208:9042


Any ideas what to check?  nodetool status -r shows everything up, 
and I don't see errors in the logs.


-Joe





Re: RC1 - Counters

2021-05-05 Thread Joe Obernberger

Want to add - I am seeing this in the log:
INFO  [ScheduledTasks:1] 2021-05-05 09:36:05,022 
MessagingMetrics.java:206 - COUNTER_MUTATION_RSP messages were dropped 
in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 21356 ms


-joe

On 5/5/2021 9:35 AM, Joe Obernberger wrote:

I'm seeing some odd behavior with RC1 and counters - from cqlsh:

cqlsh> select * from doc.seq;

 id   | doccount
--+--
   DS |    1
 DS_1 |  844

(2 rows)
cqlsh> update doc.seq set doccount=doccount+1 where id='DS_1';
OperationTimedOut: errors={'172.16.100.208:9042': 'Client request 
timeout. See Session.execute[_async](timeout)'}, 
last_host=172.16.100.208:9042


Any ideas what to check?  nodetool status -r shows everything up, and 
I don't see errors in the logs.


-Joe



RC1 - Counters

2021-05-05 Thread Joe Obernberger

I'm seeing some odd behavior with RC1 and counters - from cqlsh:

cqlsh> select * from doc.seq;

 id   | doccount
--+--
   DS |    1
 DS_1 |  844

(2 rows)
cqlsh> update doc.seq set doccount=doccount+1 where id='DS_1';
OperationTimedOut: errors={'172.16.100.208:9042': 'Client request 
timeout. See Session.execute[_async](timeout)'}, 
last_host=172.16.100.208:9042


Any ideas what to check?  nodetool status -r shows everything up, and I 
don't see errors in the logs.


-Joe



Re: [RELEASE] Apache Cassandra 4.0-rc1 released

2021-04-25 Thread Joe Obernberger

Can't wait to try it!

-joe

On 4/25/2021 11:44 AM, Patrick McFadin wrote:
This is pretty exciting and a huge milestone for the project. 
Congratulations to all the contributors who worked hard at making this 
the release it needed to be and honoring the database that powers the 
world.Â


Patrick

On Sun, Apr 25, 2021 at 4:10 AM Mick Semb Wever  wrote:

The Cassandra team is pleased to announce the release of Apache
Cassandra version 4.0-rc1.

Apache Cassandra is a fully distributed database. It is the right
choice when you need scalability and high availability without
compromising performance.

� http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our
download section:

� http://cassandra.apache.org/download/

This version is a release candidate[1] on the 4.0 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Debian users shall note, as the docs are not yet updated, the bintray
location is now replaced with the ASF's JFrog Artifactory location:
� https://apache.jfrog.io/artifactory/cassandra/

Enjoy!

[1]: CHANGES.txt

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0-rc1
[2]: NEWS.txt

https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0-rc1
[3]: https://issues.apache.org/jira/browse/CASSANDRA


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger
Interestingly, I just tried creating two CqlSession objects and when I 
use both instead of a single CqlSession for all queries, the 'No Node 
available to execute query' no longer happens.  In other words, if I 
use a different CqlSession for updating the doc.seq table, it works.  
If that session is shared with other queries, I get the errors.


-Joe

On 4/13/2021 12:35 PM, Bowen Song wrote:


The error message is clear, it was a DriverTimeoutException, and it 
was because the query timed out after one minute.


/Note: "PT1M" means a period of one minute, see 
//https://en.wikipedia.org/wiki/ISO_8601#Durations 
<https://en.wikipedia.org/wiki/ISO_8601#Durations>/


If you need help from us to find out why did it happen, you will need 
to share a bit more information with us, such as the CQL query and the 
table definition.



On 13/04/2021 16:53, Joe Obernberger wrote:

I'm getting this error:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed 
out after PT1M


but I can't find any documentation on this message.  Anyone know 
what this means?  I'm updating a counter value and then doing a 
select from the table.  The table that I'm selecting from is very 
small <100 rows.


Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Query timed out after PT1M

2021-04-13 Thread Joe Obernberger

Thank you Bowen - I wasn't familiar with PT1M.
I'm doing the following:

update doc.seq set doccount=doccount+? where id=?
Which runs OK.
Immediately following the update, I do:
select doccount from doc.seq where id=?
It is the above statement that is throwing the error under heavy load.

The select also frequently fails with a "No node was available to 
execute the query".  I wait 50mSec and retry and that typically 
works.  Sometimes it will retry as many as 15 times before getting a 
response, but this PT1M error is new.


Running: nodetool cfstats doc.seq results in:

Total number of tables: 80

Keyspace : doc
    Read Count: 57965255
    Read Latency: 0.3294544486347899 ms
    Write Count: 384658145
    Write Latency: 0.1954830251859089 ms
    Pending Flushes: 0
    Table: seq
    SSTable count: 9
    Space used (live): 48344
    Space used (total): 48344
    Space used by snapshots (total): 0
    Off heap memory used (total): 376
    SSTable Compression Ratio: 0.6227272727272727
    Number of partitions (estimate): 35
    Memtable cell count: 6517
    Memtable data size: 264
    Memtable off heap memory used: 0
    Memtable switch count: 154
    Local read count: 12900131
    Local read latency: NaN ms
    Local write count: 15981389
    Local write latency: NaN ms
    Pending flushes: 0
    Percent repaired: 10.69
    Bloom filter false positives: 0
    Bloom filter false ratio: 0.0
    Bloom filter space used: 168
    Bloom filter off heap memory used: 96
    Index summary off heap memory used: 168
    Compression metadata off heap memory 
used: 112

    Compacted partition minimum bytes: 125
    Compacted partition maximum bytes: 149
    Compacted partition mean bytes: 149
    Average live cells per slice (last five 
minutes): NaN
    Maximum live cells per slice (last five 
minutes): 0
    Average tombstones per slice (last five 
minutes): NaN
    Maximum tombstones per slice (last five 
minutes): 0

    Dropped Mutations: 0

-Joe

On 4/13/2021 12:35 PM, Bowen Song wrote:


The error message is clear, it was a DriverTimeoutException, and it 
was because the query timed out after one minute.


/Note: "PT1M" means a period of one minute, see 
//https://en.wikipedia.org/wiki/ISO_8601#Durations 
<https://en.wikipedia.org/wiki/ISO_8601#Durations>/


If you need help from us to find out why did it happen, you will need 
to share a bit more information with us, such as the CQL query and the 
table definition.



On 13/04/2021 16:53, Joe Obernberger wrote:

I'm getting this error:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed 
out after PT1M


but I can't find any documentation on this message.  Anyone know 
what this means?  I'm updating a counter value and then doing a 
select from the table.  The table that I'm selecting from is very 
small <100 rows.


Thank you!

-Joe




<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Query timed out after PT1M

2021-04-13 Thread Joe Obernberger

I'm getting this error:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out 
after PT1M


but I can't find any documentation on this message.  Anyone know what 
this means?  I'm updating a counter value and then doing a select from 
the table.  The table that I'm selecting from is very small <100 rows.


Thank you!

-Joe




Re: Huge single-node DCs (?)

2021-04-09 Thread Joe Obernberger
We run a ~1PByte HBase cluster on top of Hadoop/HDFS that works pretty 
well.  I would love to be able to use Cassandra instead on a system 
like that.  HBase queries / scans are not the easiest to deal with, 
but, as with Cassandra, if you know the primary key, you can get to your 
data fast, even in trillions of rows. Cassandra offers some 
capabilities that HBase doesn't that I would like to leverage, but yeah 
- how can you use Cassandra with modern equipment in a bare metal 
environment?  Kubernetes could make sense as long as you're able to 
maintain data locality with however your storage is configured.
Even all SSDs - you can get a system with 24, 2 TByte SSDs, which is too 
large for 1 instance of Cassandra.  Does 4.x address any of this?


Ebay uses Cassandra and claims to have 80+ petabytes.  What do they do?

-Joe

On 4/8/2021 6:35 PM, Elliott Sims wrote:
I'm not sure I'd suggest building a single DIY Backblaze pod.  The 
SATA port multipliers are a pain both from a supply chain and systems 
management perspective.  Can be worth it when you're amortizing that 
across a lot of servers and can exert some leverage over wholesale 
suppliers, but less so for a one-off.  There's a lot more 
whitebox/OEM/etc options for high-density storage servers these days 
from Seagate, Dell, HP, Supermicro, etc that are worth a look.



I'd agree with this (both examples) sounding like a poor fit for 
Cassandra.  Seems like you could always just spin up a bunch of 
Cassandra VMs in the ESX cluster instead of one big one, but something 
like MySQL or PostgreSQL might suit your needs better.  Or even some 
sort of flatfile archive with something like Parquet if it's more 
being kept "just in case" with no need for quick random access.Â


For the 10PB example, it may be time to look at something like Hadoop, 
or maybe Ceph.


On Thu, Apr 8, 2021 at 10:39 AM Bowen Song  wrote:

This is off-topic. But if your goal is to maximise storage density
and also ensuring data durability and availability, this is what
you should be looking at:

  * hardware:
https://www.backblaze.com/blog/open-source-data-storage-server/
<https://www.backblaze.com/blog/open-source-data-storage-server/>
  * architecture and software:
https://www.backblaze.com/blog/vault-cloud-storage-architecture/
<https://www.backblaze.com/blog/vault-cloud-storage-architecture/>


    On 08/04/2021 17:50, Joe Obernberger wrote:

I am also curious on this question.� Say your use case is to
store 10PBytes of data in a new server room / data-center with
new equipment, what makes the most sense?  If your database is
primarily write with little read, I think you'd want to maximize
disk space per rack space.  So you may opt for a 2u server with
24 3.5" disks at 16TBytes each for a node with 384TBytes of disk
- so ~27 servers for 10PBytes.

Cassandra doesn't seem to be the good choice for that
configuration; the rule of thumb that I'm hearing is ~2Tbytes per
node, in which case we'd need over 5000 servers.  This seems
really unreasonable.

-Joe

On 4/8/2021 9:56 AM, Lapo Luchini wrote:

Hi, one project I wrote is using Cassandra to back the huge
amount of data it needs (data is written only once and read very
rarely, but needs to be accessible for years, so the storage
needs become huge in time and I chose Cassandra mainly for its
horizontal scalability regarding disk size) and a client of mine
needs to install that on his hosts.

Problem is, while I usually use a cluster of 6 "smallish" nodes
(which can grow in time), he only has big ESX servers with huge
disk space (which is already RAID-6 redundant) but wouldn't have
the possibility to have 3+ nodes per DC.

This is out of my usual experience with Cassandra and, as far as
I read around, out of most use-cases found on the website or
this mailing list, so the question is:
does it make sense to use Cassandra with a big (let's talk 6TB
today, up to 20TB in a few years) single-node DataCenter, and
another single-node DataCenter (to act as disaster recovery)?

Thanks in advance for any suggestion or comment!



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Huge single-node DCs (?)

2021-04-08 Thread Joe Obernberger
I am also curious on this question.  Say your use case is to store 
10PBytes of data in a new server room / data-center with new equipment, 
what makes the most sense?  If your database is primarily write with 
little read, I think you'd want to maximize disk space per rack space.  
So you may opt for a 2u server with 24 3.5" disks at 16TBytes each for a 
node with 384TBytes of disk - so ~27 servers for 10PBytes.


Cassandra doesn't seem to be the good choice for that configuration; the 
rule of thumb that I'm hearing is ~2Tbytes per node, in which case we'd 
need over 5000 servers.  This seems really unreasonable.


-Joe

On 4/8/2021 9:56 AM, Lapo Luchini wrote:
Hi, one project I wrote is using Cassandra to back the huge amount of 
data it needs (data is written only once and read very rarely, but 
needs to be accessible for years, so the storage needs become huge in 
time and I chose Cassandra mainly for its horizontal scalability 
regarding disk size) and a client of mine needs to install that on his 
hosts.


Problem is, while I usually use a cluster of 6 "smallish" nodes (which 
can grow in time), he only has big ESX servers with huge disk space 
(which is already RAID-6 redundant) but wouldn't have the possibility 
to have 3+ nodes per DC.


This is out of my usual experience with Cassandra and, as far as I 
read around, out of most use-cases found on the website or this 
mailing list, so the question is:
does it make sense to use Cassandra with a big (let's talk 6TB today, 
up to 20TB in a few years) single-node DataCenter, and another 
single-node DataCenter (to act as disaster recovery)?


Thanks in advance for any suggestion or comment!



Re: No node was available to execute query error

2021-03-17 Thread Joe Obernberger

Thank you for this.
What about using a UUID for every row as the partition key and then a 
secondary index for your time buckets instead of being part of the 
partition key?

Example - say your buckets are 2021-03-15, 2021-03-16 etc...  Your table:

create table whatever (uuid text, time_bucket text, primary key (uuid));
create index bucket_idx on whatever (bucket);

If I understand secondary indexes, they are OK to use as long as what 
your are indexing on doesn't have a huge number of distinct values.  
Even after 10 years, your secondary index only has 3650 distinct values.

Bad idea?

-Joe

On 3/16/2021 9:59 AM, Durity, Sean R wrote:


Sometimes time bucketing can be used to create manageable partition 
sizes. How much data is attached to a day, week, or minute? Could you 
use a partition and clustering key like: ((source, time_bucket), 
timestamp)?


�

Then your application logic can iterate through time buckets to pull 
out the data in scalable chunks:


Select column1, column2 from my_table where source = ‘PRIME 
SOURCE’ and time_bucket = ‘2021-03-15’;


Select column1, column2 from my_table where source = ‘PRIME 
SOURCE’ and time_bucket = ‘2021-03-16’


…

�

Also, there are implementations of Spark that will create the proper, 
single partition queries for large data sets. DataStax Analytics is 
one example (spark runs on each node).


�

�

Sean Durity – Staff Systems Engineer, Cassandra

�

*From:* Bowen Song 
*Sent:* Monday, March 15, 2021 5:27 PM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] Re: No node was available to execute query error

�

There are different approaches, depending on the application's logic. 
Roughly speaking, there's two distinct scenarios:


 1. Your application knows all the partition keys of the required data
in advance, either by reading them from another data source (e.g.:
another Cassandra table, other database, a file, or an API), or
can reconstruct the partition keys from other known information
(e.g.: sequential numbers, date time in a known range, etc.).
 2. Your application needs all (or nearly all) rows from a given
table, so you can use range requests to read everything out from
that table.

However, before you choose the second option and create a table for 
each "source" value, I must warn you that creating hundreds of tables 
in Cassandra is a bad idea.


Ask yourself a question, what is really required to 'do something'? Do 
you really need *all* data each time? Is it possible to make 'do 
something' incremental, so you'll only need *some* data each time?


�

On 15/03/2021 19:33, Joe Obernberger wrote:

Thank you.
What is the best way to iterate over a very large number of rows
in Cassandra?  I know the datastax driver let's java do blocks of
n records, but is that the best way?

-joe

On 3/15/2021 1:42 PM, Bowen Song wrote:

I personally try to avoid using secondary indexes, especially
in large clusters.

SI is not scalable, because a SI query doesn't have the
partition key information, Cassandra must send it to nearly
all nodes in a DC to get the answer. Thus, the more nodes you
have in a cluster, the slower and more expensive to run a SI
query. Creating a SI on a table also can indirectly create
large partitions in the index tables.

�

On 15/03/2021 17:27, Joe Obernberger wrote:

Great stuff - thank you.  I've spent the morning here
redesigning with smaller partitions.

If I have a large number of unique IDs that I want to
regularly 'do something' with, would it make sense to have
a table where a UUID is the partition key, and create a
secondary index on a field (call it source) that I want to
select from where the number of UUIDs per source might be
very large (billions).
So - select * from table where source=?
The number of unique source values is small - maybe 1000
Whereas each source may have billions of UUIDs.

-Joe

�

On 3/15/2021 11:18 AM, Bowen Song wrote:

To be clear, this

CREATE TABLE ... PRIMARY KEY (k1, k2);

is the same as:

CREATE TABLE ... PRIMARY KEY ((k1), k2);

but they are NOT the same as:

CREATE TABLE ... PRIMARY KEY ((k1, k2));

The first two statements creates a table with a
partition key k1 and a clustering key k2. The 3rd
statement creates a composite partition key from k1
and k2, therefore k1 and k2 are the partition keys for
this table.

�

Your example"create table xyz (uuid text, source text,
primary key (source, uuid));" uses the same syntax as
   

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger

Thank you.
What is the best way to iterate over a very large number of rows in 
Cassandra?  I know the datastax driver let's java do blocks of n 
records, but is that the best way?


-joe

On 3/15/2021 1:42 PM, Bowen Song wrote:


I personally try to avoid using secondary indexes, especially in large 
clusters.


SI is not scalable, because a SI query doesn't have the partition key 
information, Cassandra must send it to nearly all nodes in a DC to get 
the answer. Thus, the more nodes you have in a cluster, the slower and 
more expensive to run a SI query. Creating a SI on a table also can 
indirectly create large partitions in the index tables.



On 15/03/2021 17:27, Joe Obernberger wrote:


Great stuff - thank you.  I've spent the morning here redesigning 
with smaller partitions.


If I have a large number of unique IDs that I want to regularly 'do 
something' with, would it make sense to have a table where a UUID is 
the partition key, and create a secondary index on a field (call it 
source) that I want to select from where the number of UUIDs per 
source might be very large (billions).

So - select * from table where source=?
The number of unique source values is small - maybe 1000
Whereas each source may have billions of UUIDs.

-Joe


On 3/15/2021 11:18 AM, Bowen Song wrote:


To be clear, this

CREATE TABLE ... PRIMARY KEY (k1, k2);

is the same as:

CREATE TABLE ... PRIMARY KEY ((k1), k2);

but they are NOT the same as:

CREATE TABLE ... PRIMARY KEY ((k1, k2));

The first two statements creates a table with a partition key k1 and 
a clustering key k2. The 3rd statement creates a composite partition 
key from k1 and k2, therefore k1 and k2 are the partition keys for 
this table.



Your example"create table xyz (uuid text, source text, primary key 
(source, uuid));" uses the same syntax as the first statement, which 
creates the table xyz with a partition key source, and a clustering 
key uuid (which, BTW, is a non-reserved keyword).



A partition in Cassandra is solely determined by the partition 
key(s), and the clustering key(s) have nothing to do with it. The 
size of a compacted partition is determined by the number of rows in 
the partition and the size of each row. If the table doesn't have a 
clustering key, each partition will have at most one row. The row 
size is the serialized size of all data in that row, including 
tombstones.



You can reduce the partition size for a table by either reducing the 
serialized data size or adding more columns to the (composite) 
partition keys. But please be aware, you will have to provide ALL 
partition key values when you read from or write to this table 
(other than range, SI or MV queries), therefore you will need to 
consider the queries before designing the table schema. For 
scalability, you will need predictable partition size that does not 
grow over time, or have an actionable plan to re-partition the table 
when the partition size exceeds a certain threshold. Picking the 
threshold is more of an art than science, generally speaking it 
should stay below a few hundred MBs, and often no more than 100 MB.



On 15/03/2021 14:36, Joe Obernberger wrote:


Thank you Bowen - I'm redesigning the tables now.  When you give 
Cassandra two parts to the primary key like


create table xyz (uuid text, source text, primary key (source, uuid));
How is the second part of the primary key used to determine 
partition size?


-Joe

On 3/12/2021 5:27 PM, Bowen Song wrote:


The partition size min/avg/max of 8409008/15096925/25109160 bytes 
looks fine for the table fieldcounts, but the number of partitions 
is a bit worrying. Only 3 partitions? Are you expecting the 
partition size (instead of number of partitions) to grow in the 
future? That can lead to a lots of headaches.


Forget about the fieldcounts table for now, the doc table looks 
really bad. It has min/avg/max partition size of 
24602/7052951452/63771372175 bytes, the partition sizes are 
severely unevenly distributed, and the over 60GB partition is way 
too big.


You really need to redesign your table schemas, and avoid creating 
large or uneven partitions.



On 12/03/2021 18:52, Joe Obernberger wrote:


Thank you very much for helping me out on this!  The table 
fieldcounts is currently pretty small - 6.4 million rows.


cfstats are:

Total number of tables: 81

Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: fieldcounts
                SSTable count: 3
                Space used (live): 16010248
                Space used (total): 16010248
                Space used by snapshots (total): 0
                Off heap memory used (total): 4947
         Â

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
Great stuff - thank you.  I've spent the morning here redesigning with 
smaller partitions.


If I have a large number of unique IDs that I want to regularly 'do 
something' with, would it make sense to have a table where a UUID is the 
partition key, and create a secondary index on a field (call it source) 
that I want to select from where the number of UUIDs per source might be 
very large (billions).

So - select * from table where source=?
The number of unique source values is small - maybe 1000
Whereas each source may have billions of UUIDs.

-Joe


On 3/15/2021 11:18 AM, Bowen Song wrote:


To be clear, this

CREATE TABLE ... PRIMARY KEY (k1, k2);

is the same as:

CREATE TABLE ... PRIMARY KEY ((k1), k2);

but they are NOT the same as:

CREATE TABLE ... PRIMARY KEY ((k1, k2));

The first two statements creates a table with a partition key k1 and a 
clustering key k2. The 3rd statement creates a composite partition key 
from k1 and k2, therefore k1 and k2 are the partition keys for this table.



Your example"create table xyz (uuid text, source text, primary key 
(source, uuid));" uses the same syntax as the first statement, which 
creates the table xyz with a partition key source, and a clustering 
key uuid (which, BTW, is a non-reserved keyword).



A partition in Cassandra is solely determined by the partition key(s), 
and the clustering key(s) have nothing to do with it. The size of a 
compacted partition is determined by the number of rows in the 
partition and the size of each row. If the table doesn't have a 
clustering key, each partition will have at most one row. The row size 
is the serialized size of all data in that row, including tombstones.



You can reduce the partition size for a table by either reducing the 
serialized data size or adding more columns to the (composite) 
partition keys. But please be aware, you will have to provide ALL 
partition key values when you read from or write to this table (other 
than range, SI or MV queries), therefore you will need to consider the 
queries before designing the table schema. For scalability, you will 
need predictable partition size that does not grow over time, or have 
an actionable plan to re-partition the table when the partition size 
exceeds a certain threshold. Picking the threshold is more of an art 
than science, generally speaking it should stay below a few hundred 
MBs, and often no more than 100 MB.



On 15/03/2021 14:36, Joe Obernberger wrote:


Thank you Bowen - I'm redesigning the tables now.  When you give 
Cassandra two parts to the primary key like


create table xyz (uuid text, source text, primary key (source, uuid));
How is the second part of the primary key used to determine partition 
size?


-Joe

On 3/12/2021 5:27 PM, Bowen Song wrote:


The partition size min/avg/max of 8409008/15096925/25109160 bytes 
looks fine for the table fieldcounts, but the number of partitions 
is a bit worrying. Only 3 partitions? Are you expecting the 
partition size (instead of number of partitions) to grow in the 
future? That can lead to a lots of headaches.


Forget about the fieldcounts table for now, the doc table looks 
really bad. It has min/avg/max partition size of 
24602/7052951452/63771372175 bytes, the partition sizes are severely 
unevenly distributed, and the over 60GB partition is way too big.


You really need to redesign your table schemas, and avoid creating 
large or uneven partitions.



On 12/03/2021 18:52, Joe Obernberger wrote:


Thank you very much for helping me out on this!  The table 
fieldcounts is currently pretty small - 6.4 million rows.


cfstats are:

Total number of tables: 81

Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: fieldcounts
                SSTable count: 3
                Space used (live): 16010248
                Space used (total): 16010248
                Space used by snapshots (total): 0
                Off heap memory used (total): 4947
                SSTable Compression Ratio: 
0.3994304032360534

                Number of partitions (estimate): 3
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 379
                Local read latency: NaN ms
                Local write count: 0
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 100.0
                Bl

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
Thank you Bowen - I'm redesigning the tables now.  When you give 
Cassandra two parts to the primary key like


create table xyz (uuid text, source text, primary key (source, uuid));
How is the second part of the primary key used to determine partition size?

-Joe

On 3/12/2021 5:27 PM, Bowen Song wrote:


The partition size min/avg/max of 8409008/15096925/25109160 bytes 
looks fine for the table fieldcounts, but the number of partitions is 
a bit worrying. Only 3 partitions? Are you expecting the partition 
size (instead of number of partitions) to grow in the future? That can 
lead to a lots of headaches.


Forget about the fieldcounts table for now, the doc table looks really 
bad. It has min/avg/max partition size of 24602/7052951452/63771372175 
bytes, the partition sizes are severely unevenly distributed, and the 
over 60GB partition is way too big.


You really need to redesign your table schemas, and avoid creating 
large or uneven partitions.



On 12/03/2021 18:52, Joe Obernberger wrote:


Thank you very much for helping me out on this!  The table 
fieldcounts is currently pretty small - 6.4 million rows.


cfstats are:

Total number of tables: 81

Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: fieldcounts
                SSTable count: 3
                Space used (live): 16010248
                Space used (total): 16010248
                Space used by snapshots (total): 0
                Off heap memory used (total): 4947
                SSTable Compression Ratio: 
0.3994304032360534

                Number of partitions (estimate): 3
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 379
                Local read latency: NaN ms
                Local write count: 0
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 100.0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.0
                Bloom filter space used: 48
                Bloom filter off heap memory used: 24
                Index summary off heap memory used: 51
                Compression metadata off heap memory 
used: 4872

                Compacted partition minimum bytes: 8409008
                Compacted partition maximum bytes: 
25109160

                Compacted partition mean bytes: 15096925
                Average live cells per slice (last 
five minutes): NaN
                Maximum live cells per slice (last 
five minutes): 0
                Average tombstones per slice (last 
five minutes): NaN
                Maximum tombstones per slice (last 
five minutes): 0

                Dropped Mutations: 0

Commitlog is on a separate spindle on the 7 node cluster. All disks 
are SATA (spinning rust as they say!).  This is an R platform, but 
I will switch to NetworkTopologyStrategy.  I'm using Prometheus and 
Grafana to monitor Cassandra and the CPU load is typically 100 to 
200% on most of the nodes.  Disk IO is typically pretty low.


Performance - in general Async is about 10x faster.
ExecuteAsync:
35mSec for 364 rows.
8120mSec for 205001 rows.
14788mSec for 345001 rows.
4117mSec for 86400 rows.

23,330 rows per second on average

Execute:
232mSec for 364 rows.
584869mSec for 1263283 rows
46290mSec for 86400 rows

2,160 rows per second on average

Curious - our largest table (doc) has the following stats - is it not 
partitioned well?


Total number of tables: 81

Keyspace : doc
        Read Count: 3713134
        Read Latency: 0.2664131157130338 ms
        Write Count: 47513045
        Write Latency: 1.0725477948634947 ms
        Pending Flushes: 0
                Table: doc
                SSTable count: 26
                Space used (live): 57124641753
                Space used (total): 57124641753
                Space used by snapshots (total): 
113012646218

                Off heap memory used (total): 27331913
                SSTable Compression Ratio: 
0.2531585373184219

                Number

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
  Compacted partition minimum bytes: 24602
                Compacted partition maximum bytes: 
63771372175

                Compacted partition mean bytes: 7052951452
                Average live cells per slice (last five 
minutes): NaN
                Maximum live cells per slice (last five 
minutes): 0
                Average tombstones per slice (last five 
minutes): NaN
                Maximum tombstones per slice (last five 
minutes): 0

                Dropped Mutations: 0

Thank again!

-Joe

On 3/12/2021 11:01 AM, Bowen Song wrote:


Sleep-then-retry works is just another indicator that it's likely a GC 
pause related issue. I'd recommend you to check your Cassandra 
servers' GC logs first.


Do you know what's the maximum partition size for the doc.fieldcounts 
table? (Try the "nodetool cfstats doc.fieldcounts" command) I suspect 
this table has large partitions, which usually leads to GC issues.


As of your failed executeAsync() insert issue, do you know how many 
concurrent on-the-fly queries do you have? Cassandra driver has 
limitations on it, and new executeAsync() calls will fail when the 
limit is reached.


I'm also a bit concerned about your "significantly" slower inserts. 
Inserts (excluding "INSERT IF NOT EXISTS") should be very fast in 
Cassandra. How slow are they? Are they always slow like that, or 
usually fast but some are much slower than others? What does the CPU 
usage & disk IO look like on the Cassandra server? Do you have 
commitlog on the same disk as the data? Is it a spinning disk, SATA 
SSD or NVMe?


BTW, you really shouldn't use SimpleStrategy for production environments.


On 12/03/2021 15:18, Joe Obernberger wrote:


The queries that are failing are:

select fieldvalue, count from doc.ordered_fieldcounts where source=? 
and fieldname=? limit 10


Created with:
CREATE TABLE doc.ordered_fieldcounts (
    source text,
    fieldname text,
    count bigint,
    fieldvalue text,
    PRIMARY KEY ((source, fieldname), count, fieldvalue)
) WITH CLUSTERING ORDER BY (count DESC, fieldvalue ASC)

and:

select fieldvalue, count from doc.fieldcounts where source=? and 
fieldname=?


Created with:
CREATE TABLE doc.fieldcounts (
    source text,
    fieldname text,
    fieldvalue text,
    count bigint,
    PRIMARY KEY (source, fieldname, fieldvalue)
)

This really seems like a driver issue.  I put retry logic around the 
calls and now those queries work.  Basically if it throws an 
exception, I Thread.sleep(500) and then retry.  This seems to be a 
continuing theme with Cassandra in general.  Is this common practice?


After doing this retry logic, an insert statement started failing 
with an illegal state exception when I retried it (which makes 
sense).  This insert was using 
session.executeAsync(boundStatement).  I changed that to just 
execute (instead of async) and now I get no errors, no retries 
anywhere.  The insert is *significantly* slower when running execute 
vs executeAsync.  When using executeAsync:


com.datastax.oss.driver.api.core.NoNodeAvailableException: No node 
was available to execute the query
        at 
com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
        at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:99)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:91)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:79)
        at 
com.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:91)
        at 
com.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:86)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.handleOrderedFieldCounts(FTAProcess.java:684)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:214)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:190)

        at com.ngc.helios.fieldanalyzer.Main.main(Main.java:20)

The interesting part here is the the line that is now failing (line 
684 in FTAProcess) is:


if (itRs.hasNext())

where itRs is an iterator over a select query from another 
table.  I'm iterating over a result set from a select and inserting 
those results via executeAsync.


-Joe

On 3/12/2021 9:07 AM, Bowen Song wrote:


Millions rows in a single query? That sounds like a bad idea to me. 
Your "NoNodeAvailableException" could be caused by stop-the-wo

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
One question on the 'millions rows in a single query'.  How would you 
process that many rows?  At some point, I'd like to be able to process 
10-100 billion rows.  Isn't that something that can be done with 
Cassandra?  I'm coming from HBase where we'd run map reduce jobs.

Thank you.

-Joe

On 3/12/2021 9:07 AM, Bowen Song wrote:


Millions rows in a single query? That sounds like a bad idea to me. 
Your "NoNodeAvailableException" could be caused by stop-the-world GC 
pauses, and the GC pauses are likely caused by the query itself.


On 12/03/2021 13:39, Joe Obernberger wrote:


Thank you Paul and Erick.  The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;


Would that cause this?

The program that is having the problem selects data, calculates 
stuff, and inserts.  It works with smaller selects, but when the 
number of rows is in the millions, I start to get this error.  Since 
it works with smaller sets, I don't believe it to be a network 
error.  All the nodes are definitely up as other processes are 
working OK, it's just this one program that fails.


The full stack trace:

Error: com.datastax.oss.driver.api.core.NoNodeAvailableException: No 
node was available to execute the query
com.datastax.oss.driver.api.core.NoNodeAvailableException: No node 
was available to execute the query
        at 
com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
        at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
        at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
        at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
        at 
com.abc..fieldanalyzer.FTAProcess.udpateCassandraFTAMetrics(FTAProcess.java:275)
        at 
com.abc..fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:216)
        at 
com.abc..fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:199)

        at com.abc..fieldanalyzer.Main.main(Main.java:20)

FTAProcess like 275 is:

ResultSet rs = session.execute(getFieldCounts.bind().setString(0, 
rb.getSource()).setString(1, rb.getFieldName()));


-Joe

On 3/12/2021 8:30 AM, Paul Chandler wrote:

Hi Joe

This could also be caused by the replication factor of the keyspace, 
if you have NetworkTopologyStrategy and it doesn’t list a 
replication factor for the datacenter datacenter1 then you will get 
this error message too.Â


Paul

On 12 Mar 2021, at 13:07, Erick Ramirez <mailto:erick.rami...@datastax.com>> wrote:


Does it get returned by the driver every single time? The 
NoNodeAvailableExceptiongets thrown when (1) all nodes are down, or 
(2) all the contact points are invalid from the driver's perspective.


Is it possible there's no route/connectivity from your app 
server(s) to the 172.16.x.xnetwork? If you post the full error 
message + full stacktrace, it might provide clues. Cheers!



<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 
	Virus-free. www.avg.com 
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=emailclient> 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger

The queries that are failing are:

select fieldvalue, count from doc.ordered_fieldcounts where source=? and 
fieldname=? limit 10


Created with:
CREATE TABLE doc.ordered_fieldcounts (
    source text,
    fieldname text,
    count bigint,
    fieldvalue text,
    PRIMARY KEY ((source, fieldname), count, fieldvalue)
) WITH CLUSTERING ORDER BY (count DESC, fieldvalue ASC)

and:

select fieldvalue, count from doc.fieldcounts where source=? and fieldname=?

Created with:
CREATE TABLE doc.fieldcounts (
    source text,
    fieldname text,
    fieldvalue text,
    count bigint,
    PRIMARY KEY (source, fieldname, fieldvalue)
)

This really seems like a driver issue.  I put retry logic around the 
calls and now those queries work.  Basically if it throws an exception, 
I Thread.sleep(500) and then retry.  This seems to be a continuing 
theme with Cassandra in general.  Is this common practice?


After doing this retry logic, an insert statement started failing with 
an illegal state exception when I retried it (which makes sense).  This 
insert was using session.executeAsync(boundStatement).  I changed that 
to just execute (instead of async) and now I get no errors, no retries 
anywhere.  The insert is *significantly* slower when running execute vs 
executeAsync.  When using executeAsync:


com.datastax.oss.driver.api.core.NoNodeAvailableException: No node was 
available to execute the query
        at 
com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
        at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:99)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:91)
        at 
com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:79)
        at 
com.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:91)
        at 
com.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:86)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.handleOrderedFieldCounts(FTAProcess.java:684)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:214)
        at 
com.ngc.helios.fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:190)

        at com.ngc.helios.fieldanalyzer.Main.main(Main.java:20)

The interesting part here is the the line that is now failing (line 684 
in FTAProcess) is:


if (itRs.hasNext())

where itRs is an iterator over a select query from another table.  
I'm iterating over a result set from a select and inserting those 
results via executeAsync.


-Joe

On 3/12/2021 9:07 AM, Bowen Song wrote:


Millions rows in a single query? That sounds like a bad idea to me. 
Your "NoNodeAvailableException" could be caused by stop-the-world GC 
pauses, and the GC pauses are likely caused by the query itself.


On 12/03/2021 13:39, Joe Obernberger wrote:


Thank you Paul and Erick.  The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;


Would that cause this?

The program that is having the problem selects data, calculates 
stuff, and inserts.  It works with smaller selects, but when the 
number of rows is in the millions, I start to get this error.  Since 
it works with smaller sets, I don't believe it to be a network 
error.  All the nodes are definitely up as other processes are 
working OK, it's just this one program that fails.


The full stack trace:

Error: com.datastax.oss.driver.api.core.NoNodeAvailableException: No 
node was available to execute the query
com.datastax.oss.driver.api.core.NoNodeAvailableException: No node 
was available to execute the query
        at 
com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
        at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
        at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
        at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
        at 
com.abc..fieldanalyzer.FTAProcess.udpateCassandraFTAMetrics(FTAProcess.java:275)
 Â

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger

Thank you Paul and Erick.  The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;


Would that cause this?

The program that is having the problem selects data, calculates stuff, 
and inserts.  It works with smaller selects, but when the number of 
rows is in the millions, I start to get this error. Since it works with 
smaller sets, I don't believe it to be a network error.  All the nodes 
are definitely up as other processes are working OK, it's just this one 
program that fails.


The full stack trace:

Error: com.datastax.oss.driver.api.core.NoNodeAvailableException: No 
node was available to execute the query
com.datastax.oss.driver.api.core.NoNodeAvailableException: No node was 
available to execute the query
        at 
com.datastax.oss.driver.api.core.NoNodeAvailableException.copy(NoNodeAvailableException.java:40)
        at 
com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
        at 
com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
        at 
com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
        at 
com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
        at 
com.abc..fieldanalyzer.FTAProcess.udpateCassandraFTAMetrics(FTAProcess.java:275)
        at 
com.abc..fieldanalyzer.FTAProcess.storeResults(FTAProcess.java:216)
        at 
com.abc..fieldanalyzer.FTAProcess.startProcess(FTAProcess.java:199)

        at com.abc..fieldanalyzer.Main.main(Main.java:20)

FTAProcess like 275 is:

ResultSet rs = session.execute(getFieldCounts.bind().setString(0, 
rb.getSource()).setString(1, rb.getFieldName()));


-Joe

On 3/12/2021 8:30 AM, Paul Chandler wrote:

Hi Joe

This could also be caused by the replication factor of the keyspace, 
if you have NetworkTopologyStrategy and it doesn’t list a 
replication factor for the datacenter datacenter1 then you will get 
this error message too.Â


Paul

On 12 Mar 2021, at 13:07, Erick Ramirez > wrote:


Does it get returned by the driver every single time? The 
NoNodeAvailableExceptiongets thrown when (1) all nodes are down, or 
(2) all the contact points are invalid from the driver's perspective.


Is it possible there's no route/connectivity from your app server(s) 
to the 172.16.x.xnetwork? If you post the full error message + full 
stacktrace, it might provide clues. Cheers!



 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

  1   2   >