Re: Error during select query - Found other issues with cluster too

2017-12-20 Thread Nicolas Guyomar
Hi,

Running manual compaction is usually not the right thing to do as you will
end with some huge sstables that won't be compacted for a while.
You should first try to find out why compactions were not happening on your
cluster, because 14k sstables (I assume you are talking about this
particular table, using STCS ? ) is a lot !

As Adama said, you really need to check ulimit for cassandra process
because it might be a reason why compaction are in error, but all those
errors should be logged in system.log

Glad you found the bad script, you already have a bucketed model
month+year, so it's up to you to decide if this is enough fine grained for
your use case, or if you need to refine it further using day/month/year
maybe so that you end up whith more partitions, but smaller and less
harmful for you JVM

It might be helpful for other reader of this ML that you describe your
cluster (C* version, number of nodes, memory per node stuff like that)

On 20 December 2017 at 12:19, Dipan Shah  wrote:

> Hello Nicolas,
>
>
> Here's our data model:
>
>
>
>-
>
>CREATE TABLE hhahistory.history (
>-
>
>   tablename text,
>   -
>
>   columnname text,
>   -
>
>   tablekey bigint,
>   -
>
>   updateddate timestamp,
>   -
>
>   dateyearpart bigint,
>   -
>
>   historyid bigint,
>   -
>
>   appname text,
>   -
>
>   audittype text,
>   -
>
>   createddate timestamp,
>   -
>
>   dbsession uuid,
>   -
>
>   firstname text,
>   -
>
>   historybatch uuid,
>   -
>
>   historycassandraid uuid,
>   -
>
>   hostname text,
>   -
>
>   isvlm boolean,
>   -
>
>   lastname text,
>   -
>
>   loginname text,
>   -
>
>   newvalue text,
>   -
>
>   notes text,
>   -
>
>   oldvalue text,
>   -
>
>   reason text,
>   -
>
>   updatedby text,
>   -
>
>   updatedutcdate timestamp,
>   -
>
>   dbname text,
>   -
>
>   PRIMARY KEY (( tablename, columnname,dateyearpart ), tablekey,
>   updateddate, historyid));
>
>
> We are using this to store audit data of our primary SQL Server DB. Our
> primary key consists of the original table name, column name and the
> month+year combination.
>
>
> I just realized that a script had managed to sneak in more than 100
> million rows on the same day so that might me the reason for all this data
> going into the same partition. I'll see if I can do something about this.
>
>
> Thanks,
>
> Dipan Shah
>
>
> --
> *From:* Nicolas Guyomar 
> *Sent:* Wednesday, December 20, 2017 2:48 PM
> *To:* user@cassandra.apache.org
>
> *Subject:* Re: Error during select query - Found other issues with
> cluster too
>
> Hi Dipan,
>
> This seems like a really unbalanced modelisation, you have some very wide
> rows !
>
> Can you share your model and explain a bit what you are storing in this
> table ? Your partition key might not be appropriate
>
> On 20 December 2017 at 09:43, Dipan Shah  wrote:
>
> Hello Kurt,
>
>
> I think I might have found the problem:
>
>
> Can you please look at the tablehistogram for a table and see if that
> seems to be the problem? I think the Max Partition Size and Cell Count are
> too high:
>
>
> *Percentile* *SSTables* *Write Latency (micros)* *Read Latency (micros)* 
> *Partition
> Size (bytes)* *Cell Count*
> 50.00% 0.00 0.00 0.00 29521 2299
> 75.00% 0.00 0.00 0.00 379022 29521
> 95.00% 0.00 0.00 0.00 5839588 454826
> 98.00% 0.00 0.00 0.00 30130992 2346799
> 99.00% 0.00 0.00 0.00 89970660 7007506
> Min 0.00 0.00 0.00 150 0
> Max 0.00 0.00 0.00 53142810146 1996099046
>
>
> Thanks,
>
> Dipan Shah
>
>
> --
> *From:* Dipan Shah 
> *Sent:* Wednesday, December 20, 2017 12:04 PM
> *To:* User
> *Subject:* Re: Error during select query - Found other issues with
> cluster too
>
>
> Hello Kurt,
>
>
> We are using V 3.11.0 and I think this might a part of a bigger problem.
> I can see that nodes are failing in my cluster unexpectedly and also
> repair commands are failing.
>
>
> Repair command failure error:
>
>
> INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332
> Message.java:619 - Unexpected exception during request; channel = [id:
> 0xacc9a54a, L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer
> at io.net

Re: Error during select query - Found other issues with cluster too

2017-12-20 Thread Dipan Shah
Hello Nicolas,


Here's our data model:


  *   CREATE TABLE hhahistory.history (

 *   tablename text,

 *   columnname text,

 *   tablekey bigint,

 *   updateddate timestamp,

 *   dateyearpart bigint,

 *   historyid bigint,

 *   appname text,

 *   audittype text,

 *   createddate timestamp,

 *   dbsession uuid,

 *   firstname text,

 *   historybatch uuid,

 *   historycassandraid uuid,

 *   hostname text,

 *   isvlm boolean,

 *   lastname text,

 *   loginname text,

 *   newvalue text,

 *   notes text,

 *   oldvalue text,

 *   reason text,

 *   updatedby text,

 *   updatedutcdate timestamp,

 *   dbname text,

 *   PRIMARY KEY (( tablename, columnname,dateyearpart ), tablekey, 
updateddate, historyid));


We are using this to store audit data of our primary SQL Server DB. Our primary 
key consists of the original table name, column name and the month+year 
combination.


I just realized that a script had managed to sneak in more than 100 million 
rows on the same day so that might me the reason for all this data going into 
the same partition. I'll see if I can do something about this.


Thanks,

Dipan Shah



From: Nicolas Guyomar 
Sent: Wednesday, December 20, 2017 2:48 PM
To: user@cassandra.apache.org
Subject: Re: Error during select query - Found other issues with cluster too

Hi Dipan,

This seems like a really unbalanced modelisation, you have some very wide rows !

Can you share your model and explain a bit what you are storing in this table ? 
Your partition key might not be appropriate

On 20 December 2017 at 09:43, Dipan Shah 
mailto:dipan@hotmail.com>> wrote:

Hello Kurt,


I think I might have found the problem:


Can you please look at the tablehistogram for a table and see if that seems to 
be the problem? I think the Max Partition Size and Cell Count are too high:


Percentile  SSTablesWrite Latency (micros)  Read Latency (micros)   
Partition Size (bytes)  Cell Count
50.00%  0.000.000.0029521   2299
75.00%  0.000.000.00379022  29521
95.00%  0.000.000.005839588 454826
98.00%  0.000.000.00301309922346799
99.00%  0.000.000.00899706607007506
Min 0.000.000.00150 0
Max 0.000.000.0053142810146 1996099046



Thanks,

Dipan Shah



From: Dipan Shah mailto:dipan@hotmail.com>>
Sent: Wednesday, December 20, 2017 12:04 PM
To: User
Subject: Re: Error during select query - Found other issues with cluster too


Hello Kurt,


We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.


Repair command failure error:


INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042<http://10.10.52.17:9042> ! 
R:/10.10.55.229:58712<http://10.10.55.229:58712>]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042<http://10.10.52.17:9042> ! 
R:/10.10.55.229:58130<http://10.10.55.229:58130>]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer


Node failure error:


ERROR [STREAM-IN-/10.10.52.22:7000<http://10.10.52.22:7000>] 2017-12-20 
01:17:17,691 JVMStabilityInspector.java:142 - JVM state determined to be 
unstable.  Exiting forcefully due to:
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:213) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:101) ~[na:1.8.0_131]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io<http://org.apache.cassandra.io>.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io&l

Re: Error during select query - Found other issues with cluster too

2017-12-20 Thread Dipan Shah
Hello Adama,


Even I realised this and found over 14k files in the data folder.


I am not sure if this is the ideal solution but i ran a manual compaction over 
there and the number of files came down to 200.


I had the same issue in another node too so I am running a compaction there too 
and after that will update if that solved my problem.


Thanks,

Dipan Shah



From: adama.diab...@orange.com 
Sent: Wednesday, December 20, 2017 3:43 PM
To: user@cassandra.apache.org; Dipan Shah
Subject: RE: Error during select query - Found other issues with cluster too


Hi Dipan,



Your node failure trace said :

java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)

You are probably crossing the max number of opened files set at OS level for 
Cassandra login.



On linux boxes you can get the number of file handles currently opened by 
Cassandra and compare it to the max number set at OS level.

Can you, please, do the following as Cassandra login :

$ ps –efa | grep –i Cassandra # to get the Cassandra process id; let say 1234

$ lsof –n –p 1234 # change 1234 to your current 
Cassandra process id

$ ulimit –Hn ; ulimit -Sn



What are your OS and its version ?

Thanks,

Adama





De : Dipan Shah [mailto:dipan@hotmail.com]
Envoyé : mercredi 20 décembre 2017 07:34
À : User
Objet : Re: Error during select query - Found other issues with cluster too



Hello Kurt,



We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.



Repair command failure error:



INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]

io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer

at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]

INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58130]

io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer



Node failure error:



ERROR [STREAM-IN-/10.10.52.22:7000] 2017-12-20 01:17:17,691 
JVMStabilityInspector.java:142 - JVM state determined to be unstable.  Exiting 
forcefully due to:

java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)

at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]

at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]

at java.io.FileOutputStream.(FileOutputStream.java:213) ~[na:1.8.0_131]

at java.io.FileOutputStream.(FileOutputStream.java:101) ~[na:1.8.0_131]

at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:364)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.finish(SimpleSSTableMultiWriter.java:59)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.finish(RangeAwareSSTableWriter.java:129)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.streaming.StreamReceiveTask.received(StreamReceiveTask.java:110)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:656) 
~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at

RE: Error during select query - Found other issues with cluster too

2017-12-20 Thread adama.diabate
Hi Dipan,

Your node failure trace said :
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
You are probably crossing the max number of opened files set at OS level for 
Cassandra login.

On linux boxes you can get the number of file handles currently opened by 
Cassandra and compare it to the max number set at OS level.
Can you, please, do the following as Cassandra login :
$ ps -efa | grep -i Cassandra # to get the Cassandra process id; let say 1234
$ lsof -n -p 1234 # change 1234 to your current 
Cassandra process id
$ ulimit -Hn ; ulimit -Sn

What are your OS and its version ?
Thanks,
Adama


De : Dipan Shah [mailto:dipan@hotmail.com]
Envoyé : mercredi 20 décembre 2017 07:34
À : User
Objet : Re: Error during select query - Found other issues with cluster too


Hello Kurt,



We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.



Repair command failure error:


INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58130]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer


Node failure error:


ERROR [STREAM-IN-/10.10.52.22:7000] 2017-12-20 01:17:17,691 
JVMStabilityInspector.java:142 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:213) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:101) ~[na:1.8.0_131]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:364)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.finish(SimpleSSTableMultiWriter.java:59)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.finish(RangeAwareSSTableWriter.java:129)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamReceiveTask.received(StreamReceiveTask.java:110)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:656) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]





Thanks,

Dipan Shah


From: kurt greaves mailto:k...@instaclustr.com>>
Sent: Wednesday, December 20, 2017 2:23 AM
To: User
Subject: Re: Error during select query

Can you send through the full stack trace as reported in the Cassandra logs? 
Also, what version are you running?

On 19 Dec. 2017 9:23 pm, "Dipan Shah" 
mailto:dipan@hotmail.com>> wrote:

Hello,



I am getting an error message when I'm running a select query from 1 particular 
node. The error is "ServerError: java.lang.IllegalStateException: Unable to 
comput

Re: Error during select query - Found other issues with cluster too

2017-12-20 Thread Nicolas Guyomar
Hi Dipan,

This seems like a really unbalanced modelisation, you have some very wide
rows !

Can you share your model and explain a bit what you are storing in this
table ? Your partition key might not be appropriate

On 20 December 2017 at 09:43, Dipan Shah  wrote:

> Hello Kurt,
>
>
> I think I might have found the problem:
>
>
> Can you please look at the tablehistogram for a table and see if that
> seems to be the problem? I think the Max Partition Size and Cell Count are
> too high:
>
>
> *Percentile* *SSTables* *Write Latency (micros)* *Read Latency (micros)* 
> *Partition
> Size (bytes)* *Cell Count*
> 50.00% 0.00 0.00 0.00 29521 2299
> 75.00% 0.00 0.00 0.00 379022 29521
> 95.00% 0.00 0.00 0.00 5839588 454826
> 98.00% 0.00 0.00 0.00 30130992 2346799
> 99.00% 0.00 0.00 0.00 89970660 7007506
> Min 0.00 0.00 0.00 150 0
> Max 0.00 0.00 0.00 53142810146 1996099046
>
>
> Thanks,
>
> Dipan Shah
>
>
> --
> *From:* Dipan Shah 
> *Sent:* Wednesday, December 20, 2017 12:04 PM
> *To:* User
> *Subject:* Re: Error during select query - Found other issues with
> cluster too
>
>
> Hello Kurt,
>
>
> We are using V 3.11.0 and I think this might a part of a bigger problem.
> I can see that nodes are failing in my cluster unexpectedly and also
> repair commands are failing.
>
>
> Repair command failure error:
>
>
> INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332
> Message.java:619 - Unexpected exception during request; channel = [id:
> 0xacc9a54a, L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer
> at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056
> Message.java:619 - Unexpected exception during request; channel = [id:
> 0xeebf628d, L:/10.10.52.17:9042 ! R:/10.10.55.229:58130]
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peer
>
> Node failure error:
>
>
> ERROR [STREAM-IN-/10.10.52.22:7000] 2017-12-20 01:17:17,691
> JVMStabilityInspector.java:142 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.io.FileNotFoundException: /home/install/cassandra-3.11.
> 0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
> (Too many open files)
> at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> ~[na:1.8.0_131]
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> ~[na:1.8.0_131]
> at java.io.FileOutputStream.(FileOutputStream.java:101)
> ~[na:1.8.0_131]
> at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.
> flushBf(BigTableWriter.java:486) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.
> doPrepare(BigTableWriter.java:516) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.utils.concurrent.Transactional$
> AbstractTransactional.prepareToCommit(Transactional.java:173)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.io.sstable.format.big.BigTableWriter$
> TransactionalProxy.doPrepare(BigTableWriter.java:364)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.utils.concurrent.Transactional$
> AbstractTransactional.prepareToCommit(Transactional.java:173)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.utils.concurrent.Transactional$
> AbstractTransactional.finish(Transactional.java:184)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at 
> org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.finish(
> SimpleSSTableMultiWriter.java:59) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.finish(
> RangeAwareSSTableWriter.java:129) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.streaming.StreamReceiveTask.
> received(StreamReceiveTask.java:110) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.streaming.StreamSession.
> receive(StreamSession.java:656) ~[apache-cassandra-3.11.0.jar:3.11.0]
> at org.apache.cassandra.streaming.StreamSession.
> messageReceived(StreamSession.java:523) ~[apache-cassandra-3.11.0.jar:
> 3.11.0]
> at org.apache.cassandra.streaming.ConnectionHandler$
> IncomingMessageHandler.run(ConnectionHandler.java:317)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
> at java.lang.Thread.run(Th

Re: Error during select query - Found other issues with cluster too

2017-12-20 Thread Dipan Shah
Hello Kurt,


I think I might have found the problem:


Can you please look at the tablehistogram for a table and see if that seems to 
be the problem? I think the Max Partition Size and Cell Count are too high:


Percentile  SSTablesWrite Latency (micros)  Read Latency (micros)   
Partition Size (bytes)  Cell Count
50.00%  0.000.000.0029521   2299
75.00%  0.000.000.00379022  29521
95.00%  0.000.000.005839588 454826
98.00%  0.000.000.00301309922346799
99.00%  0.000.000.00899706607007506
Min 0.000.000.00150 0
Max 0.000.000.0053142810146 1996099046



Thanks,

Dipan Shah



From: Dipan Shah 
Sent: Wednesday, December 20, 2017 12:04 PM
To: User
Subject: Re: Error during select query - Found other issues with cluster too


Hello Kurt,


We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.


Repair command failure error:


INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58130]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer


Node failure error:


ERROR [STREAM-IN-/10.10.52.22:7000] 2017-12-20 01:17:17,691 
JVMStabilityInspector.java:142 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:213) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:101) ~[na:1.8.0_131]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:364)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.finish(SimpleSSTableMultiWriter.java:59)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.finish(RangeAwareSSTableWriter.java:129)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamReceiveTask.received(StreamReceiveTask.java:110)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:656) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]




Thanks,

Dipan Shah



From: kurt greaves 
Sent: Wednesday, December 20, 2017 2:23 AM
To: User
Subject: Re: Error during select query

Can you send through the full stack trace as reported in the Cassandra logs? 
Also, what version are you running?

On 19 Dec. 2017 9:23 pm, "Dipan Shah" 
mailto:dipan@hotmail.com>> wrote:

Hello,


I am getting an error message when I'm running a select query from 1 particular 
node. The error is "ServerError: java.lang.IllegalStateException: Unable to 
compute ceiling for max when histogram overflowed".


Has anyone faced this e

Re: Error during select query - Found other issues with cluster too

2017-12-19 Thread Dipan Shah
Hello Kurt,


We are using V 3.11.0 and I think this might a part of a bigger problem. I can 
see that nodes are failing in my cluster unexpectedly and also repair commands 
are failing.


Repair command failure error:


INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:02,332 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xacc9a54a, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58712]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
INFO  [Native-Transport-Requests-2] 2017-12-19 17:06:11,056 Message.java:619 - 
Unexpected exception during request; channel = [id: 0xeebf628d, 
L:/10.10.52.17:9042 ! R:/10.10.55.229:58130]
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer


Node failure error:


ERROR [STREAM-IN-/10.10.52.22:7000] 2017-12-20 01:17:17,691 
JVMStabilityInspector.java:142 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.io.FileNotFoundException: 
/home/install/cassandra-3.11.0/data/data/hhahistory/history-065e0c90d9be11e7afbcdfeb48785ac5/mc-19095-big-Filter.db
 (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_131]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:213) ~[na:1.8.0_131]
at java.io.FileOutputStream.(FileOutputStream.java:101) ~[na:1.8.0_131]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:486)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:516)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:364)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.SSTableWriter.finish(SSTableWriter.java:264)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.finish(SimpleSSTableMultiWriter.java:59)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.finish(RangeAwareSSTableWriter.java:129)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamReceiveTask.received(StreamReceiveTask.java:110)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:656) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:523)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:317)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]




Thanks,

Dipan Shah



From: kurt greaves 
Sent: Wednesday, December 20, 2017 2:23 AM
To: User
Subject: Re: Error during select query

Can you send through the full stack trace as reported in the Cassandra logs? 
Also, what version are you running?

On 19 Dec. 2017 9:23 pm, "Dipan Shah" 
mailto:dipan@hotmail.com>> wrote:

Hello,


I am getting an error message when I'm running a select query from 1 particular 
node. The error is "ServerError: java.lang.IllegalStateException: Unable to 
compute ceiling for max when histogram overflowed".


Has anyone faced this error earlier? I tried to search for this but did not get 
anything that matches my scenario.


Please note, I do not get this error when I run the same query from any other 
node. And I'm connecting to the node using cqlsh.


Thanks,

Dipan Shah