[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461087#comment-15461087 ] Mattias W commented on CASSANDRA-11528: --- A select count(*) should not take more space than what is needed for the primary key, I have some fat columns. Is there a standard cassandra test database generated by scripts, which I can use for reproducing? > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461009#comment-15461009 ] Benjamin Lerer commented on CASSANDRA-11528: A {{count((*))}} is not using more memory than a {{SELECT (*)}} as the as the query is paged internally. In fact in your version it should use even less memory. If you could provide us a testcase to reproduce the problem or an heap dump, it will help us to investigate this problem. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15460611#comment-15460611 ] Mattias W commented on CASSANDRA-11528: --- I thought the answer was that `count(*)` is a very expensive operation memory-wise, since that would explain the behaviour. I have now stopped using count(*). Maybe this is more a faq-issue, even if I really do not like that a stupid client can crash the server just by making an expensive operation. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274015#comment-15274015 ] Benjamin Lerer commented on CASSANDRA-11528: Sorry, I rode you summary too fast. I though that you only had the problem with count queries on 3.3. >From your log I see at several places that the jvm as created some heap dump: >{{Dumping heap to java_pid56752.hprof}} If you could provide us one of those dumps it will be usefull to see from where the problem is coming from. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W >Assignee: Benjamin Lerer > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268471#comment-15268471 ] Benjamin Lerer commented on CASSANDRA-11528: [~matthiasw] the amount of heap used by the select statement is dependent on the amount of data that you store in your rows. Count queries will paginate internally your rows based on the page size that you use (the LIMIT has no effect on aggregation queries). For a java driver the page size is 5000. By consequence, if all of your rows contains 4Mb you will end up with around 19 Gb in memory unless ... your JVM exit before. {{COUNT((*))}} is an expensive operation as it will have to bring back all the rows to the coordinator in order to coun them properly (some replica might not have the latest data). I will try to see if it is possible to avoid that type of problem. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268447#comment-15268447 ] Benjamin Lerer commented on CASSANDRA-11528: -XX:+HeapDumpOnOutOfMemoryError does not work anymore see: CASSANDRA-9861 > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262125#comment-15262125 ] Alex Petrov commented on CASSANDRA-11528: - Could you take a closer look at your heapdump (start node with {{-XX:+HeapDumpOnOutOfMemoryError}}) and open the file after the node has crashed (with, for example [jProfiler|https://www.ej-technologies.com/products/jprofiler/overview.html] or [VisualVm|https://visualvm.java.net/]. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233551#comment-15233551 ] Mattias W commented on CASSANDRA-11528: --- It is a out-of-memory error. Cassandra 3.4 on ubuntu 14.04 behaves the same, and there, the last message in the log is {noformat} INFO [SharedPool-Worker-3] 2016-04-09 15:32:34,915 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds INFO [Service Thread] 2016-04-09 15:34:50,366 GCInspector.java:284 - ConcurrentMarkSweep GC in 443ms. CMS Old Gen: 547965232 -> 268786192; Par Eden Space: 126017056 -> 0; Par Survivor Space: 3420928 -> 0 INFO [Service Thread] 2016-04-09 15:34:50,379 StatusLogger.java:52 - Pool Name Active Pending Completed Blocked All Time Blocked ERROR [SharedPool-Worker-2] 2016-04-09 15:34:50,409 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77] at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77] at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:296) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.Cell$Serializer.serialize(Cell.java:208) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:185) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:110) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:98) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:134) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:292) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1799) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2467) ~[apache-cassandra-3.4.jar:3.4] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_77] at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.4.jar:3.4] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.4.jar:3.4] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] INFO [Service Thread] 2016-04-09 15:34:50,412 StatusLogger.java:56 - MutationStage 0 0157 0 0 INFO [Service Thread] 2016-04-09 15:34:50,414 StatusLogger.java:56 - ViewMutationStage 0
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233542#comment-15233542 ] Mattias W commented on CASSANDRA-11528: --- This error also occurs with the same database contents on cassandra 3.4 on Ubuntu 14.04. > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.x > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple > {noformat}select count(*) from {noformat} > also kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows
[ https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231649#comment-15231649 ] Mattias W commented on CASSANDRA-11528: --- I get strange behaviour also on smaller and much more normal tables. For example SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING; works fine from within devcenter but the next one, which hits many more rows, temporarily makes the server unavailable, and reports "Unable to execute CQL script on 'connection1': Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed". This error message is the same as above, except that the server doesn't die. SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING; > Server Crash when select returns more than a few hundred rows > - > > Key: CASSANDRA-11528 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11528 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: windows 7, 8 GB machine >Reporter: Mattias W > Fix For: 3.3 > > Attachments: datastax_ddc_server-stdout.2016-04-07.log > > > While implementing a dump procedure, which did "select * from" from one table > at a row, I instantly kill the server. A simple "select count(*) from" also > kills it. For a while, I thought the size of blobs were the cause > I also try to only have a unique id as partition key, I was afraid a single > partition got too big or so, but that didn't change anything > It happens every time, both from Java/Clojure and from DevCenter. > I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is > so quick, so nothing is recorded there. > There is a Java-out-of-memory in the logs, but that isn't from the time of > the crash. > It only happens for one table, it only has 15000 entries, but there are blobs > and byte[] stored there, size between 100kb - 4Mb. Total size for that table > is about 6.5 GB on disk. > I made a workaround by doing many small selects instead, each only fetching > 100 rows. > Is there a setting a can set to make the system log more eagerly, in order to > at least get a stacktrace or similar, that might help you. > It is the prun_srv that dies. Restarting the NT service makes Cassandra run > again -- This message was sent by Atlassian JIRA (v6.3.4#6332)