[jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2017-02-22 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880056#comment-15880056
 ] 

anmols commented on CASSANDRA-13006:


I have suggested a patch for this issue, can you please let me know if this is 
acceptable for this issue?

> Disable automatic heap dumps on OOM error
> -
>
> Key: CASSANDRA-13006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: anmols
>Assignee: Benjamin Lerer
>Priority: Minor
> Fix For: 3.0.9
>
> Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2017-02-13 Thread anmols (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anmols updated CASSANDRA-13006:
---
Status: Patch Available  (was: Open)

> Disable automatic heap dumps on OOM error
> -
>
> Key: CASSANDRA-13006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: anmols
>Assignee: Benjamin Lerer
>Priority: Minor
> Fix For: 3.0.9
>
> Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2016-12-07 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730400#comment-15730400
 ] 

anmols commented on CASSANDRA-13006:


[~blerer] We encountered a condition in one of our clusters that resulted in 
repeated OOMs across many of the nodes. The reason for the OOMs is an issue 
(CASSANDRA-12796) that was causing heap exhaustion when rebuilding secondary 
indexes over wide partitions.



> Disable automatic heap dumps on OOM error
> -
>
> Key: CASSANDRA-13006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: anmols
>Assignee: Benjamin Lerer
>Priority: Minor
> Fix For: 3.0.9
>
> Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2016-12-07 Thread anmols (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anmols updated CASSANDRA-13006:
---
Fix Version/s: 3.0.9
   Status: Patch Available  (was: Open)

Disable heap dumps if JVM option HeapDumpOnOutOfMemoryError is not set.

> Disable automatic heap dumps on OOM error
> -
>
> Key: CASSANDRA-13006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: anmols
>Assignee: Benjamin Lerer
>Priority: Minor
> Fix For: 3.0.9
>
> Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2016-12-07 Thread anmols (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anmols updated CASSANDRA-13006:
---
Attachment: 13006-3.0.9.txt

Patch for Disabling Heap Dumps if JVM option HeapDumpOnOutOfMemoryError is not 
set.

> Disable automatic heap dumps on OOM error
> -
>
> Key: CASSANDRA-13006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: anmols
>Assignee: Benjamin Lerer
>Priority: Minor
> Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-13006) Disable automatic heap dumps on OOM error

2016-12-06 Thread anmols (JIRA)
anmols created CASSANDRA-13006:
--

 Summary: Disable automatic heap dumps on OOM error
 Key: CASSANDRA-13006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
 Project: Cassandra
  Issue Type: Bug
  Components: Configuration
Reporter: anmols
Priority: Minor


With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
default if the process encountered an OOM error. These heap dumps are stored in 
the Apache Cassandra home directory unless configured otherwise (see [Cassandra 
Support 
Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
 for this feature).
 
The creation and storage of heap dumps aides debugging and investigative 
workflows, but is not be desirable for a production environment where these 
heap dumps may occupy a large amount of disk space and require manual 
intervention for cleanups. 
 
Managing heap dumps on out of memory errors and configuring the paths for these 
heap dumps are available as JVM options in JVM. The current behavior conflicts 
with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
 
A patch can be proposed here that would make the heap dump on OOM error honor 
the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM option.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-22 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687809#comment-15687809
 ] 

anmols commented on CASSANDRA-12796:


It looks like the proposed solution is only partially adequate for 3.0.x; while 
it enables memtable _flushes_ to proceed while a partition is being indexed, 
the read ordering {{OpGroup}} introduced in CASSANDRA-11905 continues to block 
memtable memory from being _reclaimed_.  In our environment, this ultimately 
blocks new memtables from being allocated while indexing is underway, which in 
turn blocks new mutations from finishing while they for new memtables to become 
available.  That ultimately also leads to heap exhaustion as blocked mutations 
accumulate.

So it looks like the granularity of the read {{OpOrder}} lock also needs to be 
reduced.  Following the logic of CASSANDRA-11905, before 3.0.x that was 
implicitly accomplished by using a pager to read the partition, which handled 
the read {{OpOrder}} correctly behind the scenes.  Is doing something like that 
an option now?

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-04 Thread anmols (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anmols updated CASSANDRA-12796:
---
Reproduced In: 3.0.8, 2.2.7, 2.1.9  (was: 2.2.7)
   Status: Awaiting Feedback  (was: Open)

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-04 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637535#comment-15637535
 ] 

anmols commented on CASSANDRA-12796:


I am able to reproduce this issue in Apache Cassandra 3.0.8 with a wide 
partition and a secondary index defined over it.

The code has changed significantly between the version reported here and 3.0.8 
however the characteristics of the failure are fairly similar, i.e. when a 
secondary index is rebuild there is a build up of large number of pending 
memtable flush runnables and the node gets overwhelmed and crashes due to an 
OOM.

Adjusting the granule on which the write barrier applies (taking a pass with 
the suggested patch's logic on the 3.0.8 code) does seem to alleviate the 
problem and I do not see the memtable flush runnables queue up, however I am 
not sure if there are other unintended consequences of tweaking this write 
barrier granule which need to be considered.

I would like to know if this patch can be brought into Cassandra 3.0.x or are 
there other solutions to deal with large partitions with secondary indexes?

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)