[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-12 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741428#comment-15741428
 ] 

Milan Majercik commented on CASSANDRA-12796:


[~beobal], sorry for delay in response. Just tested your fix on our sample data 
and it works fine. Thank you for your help.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-08 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732258#comment-15732258
 ] 

Jeremiah Jordan commented on CASSANDRA-12796:
-

I think the new algorithm looks good. It has a much better chance of doing the 
right thing.  Couple nits on just that commit, I would put the table name in 
the TRACE log message, and you might add the new -D to the jvm.options file in 
3.11+.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-08 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731908#comment-15731908
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

If you could double check with your test data, that'd be awesome. I'm just 
waiting for the latest CI runs to finish, so if all looks good with that & your 
test data I'll commit asap.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-08 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731712#comment-15731712
 ] 

Milan Majercik commented on CASSANDRA-12796:


Excellent. This is the closest we can get to keep target page size in bytes. Do 
you want me to test it with our test data? If not, could it be merged into the 
primary repository?

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-08 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731594#comment-15731594
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

Pushed another commit with an implementation of {{calculateIndexingPageSize}} 
which actually works. It isn't perfect of course, but when the average row size 
is smallish, it will tend to use the default page size of 1, which is no 
problem as the each ordering will still only be held open for a matter of 
milliseconds. When rows are larger though, that default page size could cause 
problems as the orderings will be held open for much longer, so this will help 
to keep that window more consistently bounded. 

I've also made the page size an argument to {{indexPartition}} so we don't 
perform the redundant calculation for every partition. 


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-07 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728610#comment-15728610
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

Yep, the method was totally wrong. I'm not 100% comfortable with hardcoding the 
page size as suggested so I'm just running some quick experiments to figure out 
a reasonable X for "rows per X MB".

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-07 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728596#comment-15728596
 ] 

Jeremiah Jordan commented on CASSANDRA-12796:
-

I think you want (row per part / (mean part size / 4 MB)) if you are guessing 
at "rows per 4 MB"

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-12-06 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725358#comment-15725358
 ] 

Milan Majercik commented on CASSANDRA-12796:


[~beobal], the patch for branch {{3.0}} works fine, however the page size for 
single partition pager appears to be calculated incorrectly. My table's average 
partition size is around {{7GB}} and yet the page size got calculated as {{1}}.

{code:java}
private int calculateIndexingPageSize()
{
double averageRowSize = baseCfs.getMeanPartitionSize();
if (averageRowSize <= 0)
return DEFAULT_PAGE_SIZE;

return (int) Math.max(1, Math.min(DEFAULT_PAGE_SIZE, 4 * 1024 * 1024 / 
averageRowSize));
}
{code}

This rendered index rebuild extremely slow as registering read/write order 
group implies significant performance overhead and for this reason the page 
size should have reasonable size.

I think there is no harm if we set page size to {{DEFAULT_PAGE_SIZE}} as the 
pager doesn't span across different partitions in case the partition is small 
([https://github.com/mmajercik/cassandra/commit/3fc016e73d3032f4d04584a45945141151a49213])

[12796-3.0|https://github.com/mmajercik/cassandra/tree/12796-3.0]


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-29 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705225#comment-15705225
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

The CI looks reasonable: 3 dtest failures on the 3.0 branch, which all have 
corresponding failures upstream, plus a couple of failures on the original 2.2 
branch which have since been addressed by other tickets. 

The internal paging will aim to read rows in chunks of ~4mb and I've used the 
default CQL page size of 10k rows for a floor as it seems like as good a place 
to start as any. 

[~mmajercik], [~anmols] how does this latest 3.0 version look with your 
testing? 

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-24 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15694012#comment-15694012
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

bq.So it looks like the granularity of the read OpOrder lock also needs to be 
reduced

Good point, and like you said it's possible to use a Pager here, as in pre 3.0 
versions. 

I've force-pushed new versions for 3.0/3.11/3.X/trunk (the 2.2 version is 
unchanged of course). There have been some issues with dtests in CI today, so 
I'll kick those off when the infra is stable again.

||branch||testall||dtest||
|[12796-2.2|https://github.com/beobal/cassandra/tree/12796-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-2.2-dtest]|
|[12796-3.0|https://github.com/beobal/cassandra/tree/12796-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.0-dtest]|
|[12796-3.11|https://github.com/beobal/cassandra/tree/12796-3.11]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.11-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.11-dtest]|
|[12796-3.X|https://github.com/beobal/cassandra/tree/12796-3.X]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.X-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.X-dtest]|
|[12796-trunk|https://github.com/beobal/cassandra/tree/12796-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-trunk-dtest]|


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-23 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689291#comment-15689291
 ] 

Milan Majercik commented on CASSANDRA-12796:


I observed the same issue when did brief testing on 3.0 branch

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-22 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687809#comment-15687809
 ] 

anmols commented on CASSANDRA-12796:


It looks like the proposed solution is only partially adequate for 3.0.x; while 
it enables memtable _flushes_ to proceed while a partition is being indexed, 
the read ordering {{OpGroup}} introduced in CASSANDRA-11905 continues to block 
memtable memory from being _reclaimed_.  In our environment, this ultimately 
blocks new memtables from being allocated while indexing is underway, which in 
turn blocks new mutations from finishing while they for new memtables to become 
available.  That ultimately also leads to heap exhaustion as blocked mutations 
accumulate.

So it looks like the granularity of the read {{OpOrder}} lock also needs to be 
reduced.  Following the logic of CASSANDRA-11905, before 3.0.x that was 
implicitly accomplished by using a pager to read the partition, which handled 
the read {{OpOrder}} correctly behind the scenes.  Is doing something like that 
an option now?

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-11 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657377#comment-15657377
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

Forward ported the original patch to 3.0/3.X/trunk and submitted CI jobs:

||branch||testall||dtest||
|[12796-2.2|https://github.com/beobal/cassandra/tree/12796-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-2.2-dtest]|
|[12796-3.0|https://github.com/beobal/cassandra/tree/12796-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.0-dtest]|
|[12796-3.X|https://github.com/beobal/cassandra/tree/12796-3.X]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.X-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-3.X-dtest]|
|[12796-trunk|https://github.com/beobal/cassandra/tree/12796-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12796-trunk-dtest]|


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-10 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653787#comment-15653787
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

[~mmajercik] no worries, I can take care of porting your patch to 3.0. Leave it 
with me and I'll try to get to it in the next day or two.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653748#comment-15653748
 ] 

ASF GitHub Bot commented on CASSANDRA-12796:


GitHub user mmajercik opened a pull request:

https://github.com/apache/cassandra/pull/83

12796 2.2

This is a proposed patch for CASSANDRA-12796

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mmajercik/cassandra 12796-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/83.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #83


commit de57fc5ddc3fffdd6a1eed2dee53a638d5053fab
Author: mmajercik 
Date:   2016-10-14T13:54:02Z

Changed operation group granularity to page rathen than partition when 
rebuilding secondary index

commit f5d4f1cfb8dbaf550bb1685408279cd6935d3cbf
Author: mmajercik 
Date:   2016-11-10T08:22:24Z

replaced tabs with spaces




> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653743#comment-15653743
 ] 

ASF GitHub Bot commented on CASSANDRA-12796:


Github user mmajercik closed the pull request at:

https://github.com/apache/cassandra/pull/82


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653734#comment-15653734
 ] 

ASF GitHub Bot commented on CASSANDRA-12796:


GitHub user mmajercik opened a pull request:

https://github.com/apache/cassandra/pull/82

12796 2.2

This is a proposed fix for CASSANDRA-12796

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mmajercik/cassandra 12796-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/82.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #82


commit caaa9fc82cd51b7321da4dad0715ebc561d180dd
Author: Josh McKenzie 
Date:   2016-04-01T15:46:15Z

Merge branch 'cassandra-2.1' into cassandra-2.2

commit c662259fe9e1e25e10e58eb1146de80c53e69867
Author: Yuki Morishita 
Date:   2016-02-16T18:45:36Z

Use canonical path for directory in SSTable descriptor

patch by yukim; reviewed by Paulo Motta for CASSANDRA-10587

commit 0ac2072bb2cc6d8e069e07f5cbcdf2e83cdc5b5c
Author: Andrzej Ludwikowski 
Date:   2016-03-07T18:27:54Z

DatabaseDescriptor should log stacktrace in case of Eception during seed 
provider creation

patch by Andrzej Ludwikowski; reviewed by jasobrown for CASSANDRA-11312

commit ea9b42e7d7bf9003dd6ed911035d3a85a2d99bac
Author: Benjamin Lerer 
Date:   2016-04-02T15:55:04Z

Fix paging for COMPACT tables without clustering columns

patch by Benjamin Lerer; reviewed by Tyler Hobbs for CASSANDRA-11467

commit 2ed855592ab77399e061f03f73a943aefbd44eaf
Author: Benjamin Lerer 
Date:   2016-04-02T15:59:37Z

Merge branch cassandra-2.1 into cassandra-2.2

commit 1ff9df75c46edb02bf1f994e7ecc651e29b277fb
Author: Marcus Olsson 
Date:   2016-03-24T14:06:23Z

Remove duplicate logging of sending MerkleTree request

Patch by Marcus Olsson; reviewed by marcuse for CASSANDRA-11486

commit a33038be23e4114f5b6f0736887d35656b0aa40f
Author: Ryan Magnusson 
Date:   2016-04-04T12:09:54Z

IncomingStreamingConnection version check message wrong

patch by Ryan Magnusson reviewed by Robert Stupp for CASSANDRA-11462

commit 96c53e0a5e73046acb77e2ac2a3aa9d9ef64fc65
Author: Josh McKenzie 
Date:   2016-04-06T22:37:08Z

Fix launch with whitespace in path on Windows

Patch by jmckenzie; reviewed by pmotta for CASSANDRA-11515

commit 2dd244b439049baa1a9f175237acf802e1946d74
Author: Benjamin Lerer 
Date:   2016-04-11T07:41:44Z

Ninja: fix typo in CommitLog error message

commit e22faeb8c5463a34b630aff8e265aefbe950b58d
Author: Benjamin Lerer 
Date:   2016-04-11T07:43:56Z

Merge branch cassandra-2.1 into cassandra-2.2

commit 3557d2e05c8d1059562de2a91c1b33b4fcfcc6eb
Author: Paulo Motta 
Date:   2016-04-05T19:58:06Z

Make deprecated repair methods backward-compatible with previous 
notification service

patch by Paulo Motta; reviewed by Yuki Morishita for CASSANDRA-11430

commit c1b1d3bccf30a7ee1deb633d2bc2dfbd7b9c542f
Author: Stefania Alborghetti 
Date:   2016-04-08T03:52:17Z

Checking if an unlogged batch is local is inefficient

patch by Stefania Alborghetti; reviewed by Paulo Motta for CASSANDRA-11529

commit ab2b8a60c4b6d27081d632fefa0e19ee13816e2c
Author: Aleksey Yeschenko 
Date:   2016-04-11T18:14:41Z

Merge branch 'cassandra-2.1' into cassandra-2.2

commit 19b4b637ac79b5d53b9384bd95bed8e08b43f111
Author: Jacek Lewandowski 
Date:   2016-04-08T15:31:00Z

CqlConfigHelper no longer requires both a keystore and truststore to work.

patch by Jacek Lewandowski; reviewed by Jeremiah Jordan for CASSANDRA-11532

commit 69edeaa46b78bb168f7e9d0b1c991c07b90f41ca
Author: Alex Petrov 
Date:   2016-04-14T10:26:52Z

Allow only DISTINCT queries with partition keys restrictions

patch by Alex Petrov; reviewed by Benjamin Lerer for CASSANDRA-11339

commit 220e4f62db7fe14c4d6c0e499c52059f7ebc5a53
Author: T Jake Luciani 
Date:   2016-04-15T14:00:04Z

2.2.6 version bump

commit 5c5c5b44c6d952d4d6f8170fa4ef239060275b76
Author: T Jake Luciani 
Date:   2016-04-15T14:30:21Z

2.1.14 version bump

commit 37f63ecc5d3b36fc115fd7ae98e4fc1f4bc2d1d6
Author: T Jake Luciani 
Date:   2016-04-15T14:34:48Z

Merge branch 'cassandra-2.1' into cassandra-2.2

commit 77ab77328e7e263a8e93ff24ff9b5d4be33e7c27
Author: Artem Aliev 
Date:   

[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-10 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653529#comment-15653529
 ] 

Milan Majercik commented on CASSANDRA-12796:


The patch for *2.2* can be found at 
[https://github.com/mmajercik/cassandra/tree/12796-2.2]

I made some attempt to adjust this patch for branch *3.0* but was overwhelmed 
by sheer extent of refactoring that virtually left no stone untouched. I 
suppose it takes a while until I'll be able to issue patch for *3.0*.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-08 Thread Milan Majercik (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647229#comment-15647229
 ] 

Milan Majercik commented on CASSANDRA-12796:


I'll post the formal patches shortly

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-07 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644622#comment-15644622
 ] 

Sam Tunnicliffe commented on CASSANDRA-12796:
-

b.q. I would like to know if this patch can be brought into Cassandra 3.0.x or 
are there other solutions to deal with large partitions with secondary indexes?

There aren't I'm afraid, so if you could post your patches for 2.2 & 3.0 I'll 
make sure they get reviewed (I shouldn't think a separate patch for trunk will 
be necessary).
Thanks.

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12796) Heap exhaustion when rebuilding secondary index over a table with wide partitions

2016-11-04 Thread anmols (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637535#comment-15637535
 ] 

anmols commented on CASSANDRA-12796:


I am able to reproduce this issue in Apache Cassandra 3.0.8 with a wide 
partition and a secondary index defined over it.

The code has changed significantly between the version reported here and 3.0.8 
however the characteristics of the failure are fairly similar, i.e. when a 
secondary index is rebuild there is a build up of large number of pending 
memtable flush runnables and the node gets overwhelmed and crashes due to an 
OOM.

Adjusting the granule on which the write barrier applies (taking a pass with 
the suggested patch's logic on the 3.0.8 code) does seem to alleviate the 
problem and I do not see the memtable flush runnables queue up, however I am 
not sure if there are other unintended consequences of tweaking this write 
barrier granule which need to be considered.

I would like to know if this patch can be brought into Cassandra 3.0.x or are 
there other solutions to deal with large partitions with secondary indexes?

> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> -
>
> Key: CASSANDRA-12796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Milan Majercik
>Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
> try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
> {
> Set indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
> Iterator pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
> while (pager.hasNext())
> {
> ColumnFamily cf = pager.next();
> ColumnFamily cf2 = cf.cloneMeShallow();
> for (Cell cell : cf)
> {
> if (cfs.indexManager.indexes(cell.name(), indexes))
> cf2.addColumn(cell);
> }
> cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
> }
> }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)