[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-09 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738081#comment-14738081
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. Can we impose the limit we have for throttled readers to unthrottled 
readers, and special case throttled to always return something large (64Kb 
being most sensible since right now that's our largest possible cached size)?

It's done, same branch as the coverity fix. CI still pending. 

I used the same constant for the _unthrottled_ maximum size and throttled 
default size (since they are both 64Kb), we can have two constants though - if 
you prefer.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737800#comment-14737800
 ] 

Benedict commented on CASSANDRA-8630:
-

So, looking at the code as I was about to commit, I think we should take 
another think about buffer sizes. It looks like CASSANDRA-8894 had a quality I 
hadn't noticed, i.e. that it can allocate > 64Kb buffers for reading. I'm 
pretty sure this is a bad thing (even reading that much in one go is probably 
rarely a good idea; the goal was only to reduce this number where we could 
safely do so). Secondly, that this patch introduces only a bound for this to 
throttled readers. My understanding was for throttled readers we would _always_ 
use a 64Kb buffer (since this makes quite a lot of sense, given it's sequential 
access of the whole file). 

Can we impose the limit we have for throttled readers to _unthrottled_ readers, 
and special case throttled to always return something large (64Kb being most 
sensible since right now that's our largest possible cached size)? 

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737469#comment-14737469
 ] 

Benedict commented on CASSANDRA-8630:
-

Sure, sorry - this slipped out of my workflow due to being resolved still. I'll 
rebase and commit.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-07 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733944#comment-14733944
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Since the work is already done I would say make the changes? 
FB.UR_UNINIT_READ_CALLED_FROM_SUPER_CONSTRUCTOR seems easy to justify and 
FB.NS_DANGEROUS_NON_SHORT_CIRCUIT is pretty low value, but simple enough that 
it's worth making coverity happy.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733463#comment-14733463
 ] 

Stefania commented on CASSANDRA-8630:
-

{{FB.NS_DANGEROUS_NON_SHORT_CIRCUIT}}  I don't care much about either. As for 
{{FB.UR_UNINIT_READ_CALLED_FROM_SUPER_CONSTRUCTOR}}, even if it's not a problem 
at the moment, it may confuse somebody in future and cause a problem. I also 
don't mind either way though, so let's wait for [~aweisberg]'s input as well 
and then decide?

CI:
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-8630-3.0-coverity-dtest/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-8630-3.0-coverity-testall/

A few failed dtests but the failures are also on the unpatched 3.0.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733448#comment-14733448
 ] 

Benedict commented on CASSANDRA-8630:
-

Neither of them are actually bugs. I don't mind if we "fix" them, but we don't 
_have_ to; just those that are genuine problems. Let me know if you'd still 
like to commit the patch.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-06 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733187#comment-14733187
 ] 

Stefania commented on CASSANDRA-8630:
-

The following Coverity defects were reported after committing this, working on 
a fix:

{code}
*** CID 1322973:  FindBugs: Correctness  
(FB.UR_UNINIT_READ_CALLED_FROM_SUPER_CONSTRUCTOR)
/src/java/org/apache/cassandra/io/util/RandomAccessReader.java: 83 in 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.initializeBuffer()()
77 this.bufferType = builder.bufferType;
78
79 if (builder.bufferSize <= 0)
80 throw new IllegalArgumentException("bufferSize must be 
positive");
81
82 if (builder.initializeBuffers)
>>> CID 1322973:  FindBugs: Correctness  
>>> (FB.UR_UNINIT_READ_CALLED_FROM_SUPER_CONSTRUCTOR)
>>> Call from superclass constructor here
83 initializeBuffer();
84 }
85
86 protected int getBufferSize(Builder builder)
87 {
88 if (builder.limiter == null)

** CID 1322970:  FindBugs: Dodgy code  (FB.NS_DANGEROUS_NON_SHORT_CIRCUIT)
/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java: 140 in 
org.apache.cassandra.hints.ChecksummedDataInput.updateCrc()()



*** CID 1322970:  FindBugs: Dodgy code  (FB.NS_DANGEROUS_NON_SHORT_CIRCUIT)
/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java: 140 in 
org.apache.cassandra.hints.ChecksummedDataInput.updateCrc()()
134 super.reBuffer();
135 crcPosition = buffer.position();
136 }
137
138 private void updateCrc()
139 {
>>> CID 1322970:  FindBugs: Dodgy code  (FB.NS_DANGEROUS_NON_SHORT_CIRCUIT)
>>> Potentially dangerous use of non-short-circuit logic.
140 if (crcPosition == buffer.position() | crcUpdateDisabled)
141 return;
142
143 assert crcPosition >= 0 && crcPosition < buffer.position();
144
145 ByteBuffer unprocessed = buffer.duplicate();

** CID 1322957:(FORWARD_NULL)
{code}

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730685#comment-14730685
 ] 

Benedict commented on CASSANDRA-8630:
-

Thanks. Committed as ce63ccc842dc6e7129765391c611402eb02a3a23.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-03 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730241#comment-14730241
 ] 

Stefania commented on CASSANDRA-8630:
-

CI looks fine to me, on both Linux and Windows.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728305#comment-14728305
 ] 

Stefania commented on CASSANDRA-8630:
-

Looking at the first read operation, 8630 is top in the first two runs and 
second in the third run. Anyway, good enough for sure.

Windows utests are also OK, 2 failures same as unpatched branch. dtests still 
running.

Pushed one more rebased version to run CI one more time on Linux, a failing 
unit test cropped up but seems unrelated and it passed locally.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728053#comment-14728053
 ] 

Benedict commented on CASSANDRA-8630:
-

Well, I don't know if it's random chance or what, but now it's consistently 
better (across three runs), which is good enough for me.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727634#comment-14727634
 ] 

Philip Thompson commented on CASSANDRA-8630:


Windows CI:

Dtest http://cassci.datastax.com/view/win32/job/stef1927-8630_dtest_win32/
Utest http://cassci.datastax.com/view/win32/job/stef1927-8630_utest_win32/

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727483#comment-14727483
 ] 

Joshua McKenzie commented on CASSANDRA-8630:


In regards to the interrelationship of both. It's really not hard to get 
correct (from an NTFS perspective), just that if it's not on our radar (and we 
don't run tests on Windows) we can come up with some surprises. My thought is 
that any time we change either a) file rename/deletion code or b) mmap code, we 
should run the tests on Windows just as a sanity check. Better to find out in 
advance than after a commit and who knows what other little "gifts" the 
platform has for us in the future? :)

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727049#comment-14727049
 ] 

Stefania commented on CASSANDRA-8630:
-

Rebased all 3 branches to current 3.0 and launched 3 new tests with 3 different 
orders of executions of the versions:

3.0/8630/8630-old (as before): 
http://cstar.datastax.com/tests/id/bf0dd0be-5152-11e5-9608-42010af0688f
8630/8630-old/3.0: 
http://cstar.datastax.com/tests/id/245ca684-5153-11e5-bacd-42010af0688f
8630-old/3.0/8630: 
http://cstar.datastax.com/tests/id/5177a88a-5153-11e5-bacd-42010af0688f

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727031#comment-14727031
 ] 

Benedict commented on CASSANDRA-8630:
-

Perhaps we should ensure all three are rebased to the same point and run 
another set of comparisons? Perhaps a few sets, with the order (in which we run 
the patch version) randomised each time. The {{beforesuggestions}} patch 
currently outperforms in each of the runs, but there is too much variation to 
say it is actually superior.

We need to try and isolate the inconsistencies we're seeing in cstar. With 
variance of 15% or more, it's just about impossible to say anything definitive.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-01 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726651#comment-14726651
 ] 

Stefania commented on CASSANDRA-8630:
-

The latest cperf test has completed. These are the read results of the last two 
runs:

http://cstar.datastax.com/graph?stats=25b50e7c-5090-11e5-a17a-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=145.42&ymin=0&ymax=157588.2

http://cstar.datastax.com/graph?stats=ebe0-510f-11e5-a17a-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=140.91&ymin=0&ymax=155889.8

In the second run, performance is definitely the same as 3.0. Node that 8630 
before suggestions is not rebased. 


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-01 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726545#comment-14726545
 ] 

Stefania commented on CASSANDRA-8630:
-

Fixed a small problem that was causing CI issues and now CI is stable again:

http://cassci.datastax.com/job/stef1927-8630-3.0-testall/lastBuild/testReport/
http://cassci.datastax.com/job/stef1927-8630-3.0-dtest/lastBuild/testReport/

Relaunched cperf test with this fix in, see 
[here|http://cstar.datastax.com/tests/id/ebe0-510f-11e5-a17a-42010af0688f]. 
The test I launched yesterday shows only a small difference in read ops, not 
sure if related.

My request on IRC to set-up Windows CI must have gone unnoticed; 
[~philipthompson] would you be able to set up CI on Windows for this branch?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-01 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725118#comment-14725118
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks. I've launched another [cperf 
test|http://cstar.datastax.com/tests/id/25b50e7c-5090-11e5-a17a-42010af0688f] 
with your latest changes. This should give us an indication on the variance of 
cstar.

I've also removed a couple of redundant utests in {{NIODataInputStreamTest}}, 
since the buffer can now be smaller than 9.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-09-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724898#comment-14724898
 ] 

Benedict commented on CASSANDRA-8630:
-

I must admit that I thought, from Ariel's comment 
[here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/util/NIODataInputStream.java#L182]
 that we did not actually use {{FBO.copy anymore}}, and that it did not work. I 
guess there was some other mistake happening there.

However there's no functional distinction between the two methods, since they 
both operate on a target {{byte[]}}, and as such the 
{{FastByteOperations.copy}} methods support an array as a target, so I've 
pushed a version with that changed.

It's not clear how much of the variance is cstar's current inconsistency. I'm 
reasonably certain that hotspot translates any byte-by-byte copy to a SIMD 
optimised one. However looking at the C2 compilation output, it appears that 
the {{FastByteOperations.copy}} call is fully inlined, whereas for some reason 
the {{ByteBuffer.get}} call is left as invokevirtual. This is odd, since this 
should at most be bimorphic, and I would expect to be a main target for 
optimisation by the VM. However I cannot see anywhere in hotspot's intrinsic 
definition any of the {{ByteBuffer.get}} methods either (whereas copyMemory 
most certainly is), which would have explained this.

Given this, we're probably best retaining the {{FBO.copy}} version, however we 
may as well port it over to {{read}}

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724871#comment-14724871
 ] 

Stefania commented on CASSANDRA-8630:
-

Here is a cperf result: 
http://cstar.datastax.com/tests/id/deb052ba-5068-11e5-8248-42010af0688f

I compared 3.0 with the latest 8630 and the version of 8630 before the 
suggestions (to rule out any impact from removing {{readBytes}}).

Do we have a performance regressuib or are they close enough?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724615#comment-14724615
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks [~benedict] and [~JoshuaMcKenzie].

bq. Removed the readBytes method entirely

Are you sure? This method was introduced by CASSANDRA-1714, titled _zero-copy 
reads_. It copies BBs via {{FastByteOperations}} rather than reading into a 
buffer byte by byte. Bear in mind that {{ByteBufferUtil.read()}} is one of the 
most used read methods according to flight recorder. 

bq. Removed the (as commented in TODO) unnecessary 
BufferedSegmentedFile.createReader

Thanks, I had totally forgot to remove this.

bq. MappedRegions.close now always calls channel.close, and returns any error 
thrown by it

Thanks for spotting it!

The remaining points are good. The new {{Throwables.perform}} API is really 
neat.

I've pushed a rebased version to Jenkins and I've asked on IRC to set-up 
Windows jobs for this branch. I've also launched a [cperf 
job|http://cstar.datastax.com/tests/id/35dbcc0e-5050-11e5-b3f6-42010af0688f]. 
I'll post the results once they are available.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724092#comment-14724092
 ] 

Benedict commented on CASSANDRA-8630:
-

I've pushed a version with a few suggestions, and a bug fix, in decreasing 
order of note:

* {{MappedRegions.close}} now always calls {{channel.close}}, and returns any 
error thrown by it
* Removed the {{readBytes}} method entirely, since it doesn't appear to offer 
anything {{readFully}} and, transitively, {{ByteBufferUtil.read}}, doesn't 
(correct me if I'm wrong...?)
** Simplified {{readFully}} while we're here
* Moved some of the {{close}} methods to the new {{Throwables.perform}} API to 
make their behaviour more robust
* Replaced {{AssertionError}} with {{UnsupportedOperationException}} where 
suitable
* Removed {{IOException}} from the throws clause of close, so no need to 
introduce the new {{closeQuietly}} calls
* Removed the (as commented in TODO) unnecessary 
{{BufferedSegmentedFile.createReader}}
* {{ChecksummedDataInput}} extends 
{{RandomAccessReader.RandomAccessReaderWithOwnChannel}}
* Removed / standardised EOF behaviour for RAR and CRAR

Let me know what you think, and we can squash/commit.

bq.  how picky the platform is about timing on DirectBuffer unmapping etc.

[~JoshuaMcKenzie]: is this wrt file rename / deletion, or wrt closing the 
channel? My understanding was the former, in which case any such problem would 
indicate a severe bug with 2.2+ (unless snapshots are involved, but don't think 
we even try to handle that), and shouldn't be affected by this patch. 
Absolutely no harm running the tests, of course.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723526#comment-14723526
 ] 

Joshua McKenzie commented on CASSANDRA-8630:


While I don't expect this to cause any problems on Windows from a cursory 
skimming of the patch, it might be worth running a utest/dtest combo job 
against the platform just to be sure given how picky the platform is about 
timing on DirectBuffer unmapping etc.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723274#comment-14723274
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks!

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723248#comment-14723248
 ] 

Benedict commented on CASSANDRA-8630:
-

Sure. Since it's a big patch, I'm just giving the final version a cursory 
once-over myself before doing so.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-30 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721790#comment-14721790
 ] 

Stefania commented on CASSANDRA-8630:
-

I rebase on 3.0 quite regularly, I rebased again today.

Perhaps you were looking at the old CI jobs? I renamed the branch when I moved 
from trunk to 3.0, but the old CI jobs (without -3.0) are still there. Here are 
the correct ones: 

http://cassci.datastax.com/job/stef1927-8630-3.0-testall/
http://cassci.datastax.com/job/stef1927-8630-3.0-dtest/

They seem inline with 3.0.

[~benedict] would you be happy to be the committer?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720302#comment-14720302
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

The pig tests don't work everywhere so I don't think that is this change. So 
just TestMutation concerns me.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719961#comment-14719961
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

+1 on the code. I reviewed the test failures yesterday and I thought we were 
good, but now I am confused. 

The pig tests are failing more on your branch than on 3.0 or trunk. The 
TestMutation dtest also looks suspect to me. I suspect what happened is that 
trunk and 3.0 moved out from under you. I don't see CqlTableTest in the test 
history in cassci trunk test-all???

Once you are satisfied the tests are good go ahead and find a committer. If you 
know when where you based off of you could compare with the builds at the time.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-27 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717832#comment-14717832
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks. 

sharedCopy must be implemented because it's abstract in the base class; however 
I wanted a better name to convey the idea of a snapshot. I've removed snapshot 
and kept sharedCopy but added a comment.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-27 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716863#comment-14716863
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

snapshot() and sharedCopy() are the same now so is there a need for both?

+1 after that. Cassci looks good. I get the impression we are coming out of 
this with more direct test coverage, and a place to fill in anything we missed, 
which is great.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-26 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715899#comment-14715899
 ] 

Stefania commented on CASSANDRA-8630:
-

I've removed the array deep copy and added the volatile flag. I've also 
rearranged a few comments.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-26 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715228#comment-14715228
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

bq. It works for me both from Intellij and from the command line. Have you 
checked you have set -ea?
That was it. 

bq. Nice idea, implemented, thanks.
You don't need to create a deep copy of the arrays anymore. They are only ever 
appended to and the replacements aren't visible in the shared copies until a 
new one is published. Also the copy field might need to be volatile to ensure 
the copy is published safely. It's necessary if any thread can ask for a shared 
copy while a new one is being published. volatile will ensure all stores prior 
to the volatile store are globally visible so threads will see the new copy in 
a consistent state before the reference to it is published via the copy field.

Test coverage looks really good. 

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-26 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712805#comment-14712805
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. We could consider changing this for compaction readers, or at least for 
throttled readers (which amount to the same thing). There's no reason not to 
read 64Kb at a time for compaction, since we know we'll want all of the data.

Done. If limiter is not null then the bufferSize is limited to 64k in RAR. For 
compressed RAR however, we keep on using the chunk data length.

bq. Right so there needs to be a copy, but you don't need to copy the same 
state every time you read. You can make an immutable copy once on write, and 
then share that indefinitely. I think you are on the right track with the 
isCopy flag, but maybe make it a field that is called immutableCopy or 
something, and shared copy returns the same immutable view of the state every 
time. So if immutableCopy is null then this State object is the immutable copy.

Nice idea, implemented, thanks.

bq. ChecksummedDataInput test doesn't check for failing checksums. resetCrc(), 
and readBytes() are also not tested.

Added.

bq. BufferedRandomAccessFileTest.testAssertionErrorWhenBytesPastMarkIsNegative 
failed for me.

It works for me both from Intellij and from the command line. Have you checked 
you have set {{-ea}}?

bq. CompressedRandomAccessReader.reBufferMmap() doesn't appear to be tested.

Adapted an existing test, {{testResetAndTruncate}}, to also run using mmap 
segments. 

I also fixed a few warnings in CompressedRandomAccessReaderTest and 
RandomAccessReaderTest.


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-25 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711632#comment-14711632
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

bq. We could consider changing this for compaction readers, or at least for 
throttled readers (which amount to the same thing). There's no reason not to 
read 64Kb at a time for compaction, since we know we'll want all of the data.

This would make me sleep a little better.

bq. Not sure if I found the comment you are referring to, you mean the 
description of the class Input stream around a fixed ByteBuffer? I can change 
it to Input stream around a single ByteBuffer, is this what you meant?
[I think it's already gone actually. Can't find it in the 
diff.|https://github.com/apache/cassandra/compare/trunk...stef1927:8630-3.0#diff-0d7319a2a430b96865289eb87b136b32L25]

bq. The readers are potentially used in different threads...
Right so there needs to be a copy, but you don't need to copy the same state 
every time you read. You can make an immutable copy once on write, and then 
share that indefinitely. I think you are on the right track with the isCopy 
flag, but maybe make it a field that is called immutableCopy or something, and 
shared copy returns the same immutable view of the state every time. So if 
immutableCopy is null then this State object is the immutable copy.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710938#comment-14710938
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. Buffer sizes can vary with the statistics of the files

We could consider changing this for compaction readers, or at least for 
throttled readers (which amount to the same thing). There's no reason not to 
read 64Kb at a time for compaction, since we know we'll want all of the data.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-25 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710918#comment-14710918
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. DataInputBuffer line 25, NIODataInputStream no longer has the bytes 
shuffling behavior so that comment should go away.

Not sure if I found the comment you are referring to, you mean the description 
of the class {{Input stream around a fixed ByteBuffer}}? I can change it to 
{{Input stream around a single ByteBuffer}}, is this what you meant?

bq. RebufferingInputStream copy constructor appears unused (or Eclipse is 
lying). It's also looks suspicious since it doesn't inherit the rebuffering 
behavior of whatever it is copying?

Left over from previous attempts, definitely not needed, thanks.

bq. Does ChecksummedDataInput handle files larger than 2 gigabytes? Seems like 
we could end up with large hint files? The way the file based hints loop is 
written it seems like it could do it. Possibly unintentionally.

Each single hint cannot be more than 2GB because its size is encoded as an 
integer (existing behavior). {{ChecksummedDataInput}} should be fine except for 
the handling of limits, which are used only for single hints. I will change 
{{ChecksummedDataInput.limit}} to a long for future safety and move the checked 
cast to {{readHint}}. {{crcPosition}} is an integer but this is fine as it 
caches the last buffer position, which is also an integer. It is reset each 
time we rebuffer.

bq. The CoW idiom used for MmappedRegions seems a little off. It's making a 
copy on read so every SSTableReader (they aren't shared globally I believe) 
will have a separate deep copy of the entire MmappedRegions. I know this is 
tricky and you probably get it better than I do, but can you get it so that the 
same array is shared? Ideally both the arrays and the State object will be 
shared. Looking at how the refcounting is supposed to work

The readers are potentially used in different threads. Once 
Lifecycletransaction.checkpoint() publishes a new view in the tracker, we can 
potentially use the tracker in different threads, due to various asynchronous 
tasks. At least this is what I understand. The builder lives in the 
BigTableWriter, each time we open an sstable early it will create a new sstable 
reader which shares the same builder with the previous readers and the final 
reader - hence they share the channel and the mmapped regions if any. The 
readers created in open early are published and obsoleted by the lifecycle 
transaction so MT access is possible. This is my understanding, perhaps 
[~benedict] can fill any gaps?

The whole idea was to ensure copies do not modify the arrays. It's true that we 
could share the arrays as we enforce the non-modification via the {{isCopy}} 
flag, so I am happy to avoid the deep copy. Once we implement compaction of 
mmapped regions, this will become trickier and sharing arrays may not be 
possible any longer. However, in this case we'll also have bigger problems like 
tracking the old non-compacted regions that are still used.

bq. The fact that MmappedRegions and it's owning MmappedSegmentedFile both are 
SharedClosables seems odd to me. Seems like only one of them needs to determine 
the lifetime of the whole shebang.

The segmented files and the builders own the MmappedRegions, just like for the 
channel. Sometimes the builders are closed before the segmented files, and the 
mmapped regions need to survive. It's exactly the same as for the channel. 
These two resources, channel and mmapped regions, need to have their own 
resource management independent of their owners.

bq. For rate limiting. It seems like we acquire buffer size from the rate 
limiter at a time. What is the potential distribution of buffer sizes and how 
reasonable are they? It seems like they can vary with the statistics of a file. 
Since we got into trouble with rate limiting once I just want to be sure there 
isn't a corner case where it can be a problem again.

Buffer sizes can vary with the statistics of the files, since CASSANDRA-8894. 
For data files they are a multiple of the page size and approximately equal to 
the 95th percentile of the partition size. So far we've throttled based on the 
buffer size and not at all for mmapped segments, we also had fixed buffer sizes 
of 64k up to CASSANDRA-8894. What alternatives would we have if we did not 
throttle based on the buffer size? We cannot throttle for every read method in 
{{RebufferingInputStream}} or can we?

I will work on the missing tests and submit the remaining code fixes tomorrow, 
thanks!

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710183#comment-14710183
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

DataInputBuffer line 25, NIODataInputStream no longer has the bytes shuffling 
behavior so that comment should go away.

RebufferingInputStream copy constructor appears unused (or Eclipse is lying). 
It's also looks suspicious since it doesn't inherit the rebuffering behavior of 
whatever it is copying?

Does ChecksummedDataInput handle files larger than 2 gigabytes? Seems like we 
could end up with large hint files? The way the file based hints loop is 
written it seems like it could do it. Possibly unintentionally.

The CoW idiom used for MmappedRegions seems a little off. It's making a copy on 
read so every SSTableReader (they aren't shared globally I believe) will have a 
separate deep copy of the entire MmappedRegions. I know this is tricky and you 
probably get it better than I do, but can you get it so that the same array is 
shared? Ideally both the arrays and the State object will be shared. Looking at 
how the refcounting is supposed to work 

The fact that MmappedRegions and it's owning MmappedSegmentedFile both are 
SharedClosables seems odd to me. Seems like only one of them needs to determine 
the lifetime of the whole shebang.

For rate limiting. It seems like we acquire buffer size from the rate limiter 
at a time. What is the potential distribution of buffer sizes and how 
reasonable are they? It seems like they can vary with the statistics of a file. 
Since we got into trouble with rate limiting once I just want to be sure there 
isn't a corner case where it can be a problem again.

ChecksummedDataInput test doesn't check for failing checksums. resetCrc(), and 
readBytes() are also not tested.

BufferedRandomAccessFileTest.testAssertionErrorWhenBytesPastMarkIsNegative 
failed for me.

CompressedRandomAccessReader.reBufferMmap() doesn't appear to be tested.




> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708687#comment-14708687
 ] 

Stefania commented on CASSANDRA-8630:
-

I've fixed {{ChecksummedDataInput}} so that the slow path in 
{{RebufferingInputStream}} is no longer required.

I've added a comment regarding copying of {{MmappedRegions}} and thread safety 
and filed a follow-up ticket: CASSANDRA-10158.

[~aweisberg], ready for another round of review.


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706806#comment-14706806
 ] 

Stefania commented on CASSANDRA-8630:
-

Yes that might work I will try it out, thanks.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706791#comment-14706791
 ] 

Benedict commented on CASSANDRA-8630:
-

OK. How about adapting it to just have a {{checkCrc}} method, which both 
updates the crc stream, consumes 4 bytes (not updating the crc), and compares 
them?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706777#comment-14706777
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks for your comments.

bq. For ChecksummedDataInput, we can just update the crc whenever we exhaust 
the buffer, and on calling getCrc() we can update with whatever we have read so 
far in the current buffer. 

I tried that and it didn't work. HintsReader wraps a reader but it still uses 
the underlying reader to read the crc values, i.e. the crc is in the same 
stream, but it should be excluded from updating the crc. In other words, only 
when reading we should update the crc in place, looking at the buffer content 
is not sufficient.

bq. Introducing an extra forceSlowPath property in the superclass to every 
single call is something I would prefer we avoid.

I don't like it either but other than overloading all read methods, we need to 
rethink how HintsReader updates the crc, unless you have another idea.

I agree on all other points you've raised. 

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706537#comment-14706537
 ] 

Benedict commented on CASSANDRA-8630:
-

A few random comments (not performing review, since Ariel's on that):

For ChecksummedDataInput, we can just update the crc whenever we exhaust the 
buffer, and on calling getCrc() we can update with whatever we have read so far 
in the current buffer. Introducing an extra {{forceSlowPath}} property in the 
superclass to every single call is something I would prefer we avoid.

We should comment the copying of the State object in MmappedRegions, so it's 
clear this is for thread safety, and that we still logically reference the 
original state.

We should file some follow ups to:

* Compact the mmap ranges, at least on the _final_ opening of the file
* Use the mmap extension logic for compressed files
* Generally I think we've gotten close enough to a good state that we should 
consider refactoring the whole remaining SequentialWriter, RAR, 
CompressionMetadata(.Writer) etc collection of classes. Preferably move them 
all into their own package, and make their relations more simply defined.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706492#comment-14706492
 ] 

Stefania commented on CASSANDRA-8630:
-

I completed the move of segments to the builder, the bulk of the implementation 
is {{MmappedRegions}}, which is a {{SharedCloseableImpl}} owned by the builder. 
Each time a {{SegmentedFile}} is called we copy a snapshot and increment the 
reference count. Shall we rename {{SegmentedFile}} and derived classes to 
somethign else? The compressed segmented file still owns the mmapped regions as 
it creates the metadata each time and I wasn't sure how to handle this.

I also fixed a channel ownership problem in the builder, eventually we should 
really consider passing the file path to the builder constructor.

The integration with 6230 was a bit harder than I anticipated and some hinting 
utests were failing on Jenkins. I had to introduce a flag to force the slow 
path in {{RebufferingInputStream}} or else {{ChecksummedDataInput}} would not 
work. My idea of updating the CRC during rebuffering is no good because we must 
actually rely only on what is read by this class, not the underlying reader, 
and we must extract a crc only on what's read, not necessarily buffered.

Please note the new CI links (due to renaming of the branch to 
[8630-3.0|https://github.com/stef1927/cassandra/commits/8630-3.0]:

http://cassci.datastax.com/job/stef1927-8630-3.0-testall/
http://cassci.datastax.com/job/stef1927-8630-3.0-dtest/

cstar is currently having 
[issues|http://cstar.datastax.com/tests/id/e05752e6-47c9-11e5-b4da-42010af0688f],
 as soon as it is available I will launch a comparison.

Here are the remaining CR comments:

bq. MemoryInputStream.available() can wrap the addition between 
buffer.remaining() + Ints.saturatedCast(memRemaining()). Do the addition and 
then the saturating cast.

Fixed, thanks.

bq. Why does RandomAccessReader accept a builder and a parameter for 
initializing the buffer? Seems like we lose the bonus of a builder a builder 
allowing a constant signature.

Good point, fixed.

bq. A nit in initializeBuffer, it does firstSegment.value().slice() which 
implies you want a subset of the buffer? duplicate() makes it obvious there is 
no such concern.

Fixed.

bq. I think there is a place for unit tests stressing the 2 gigabyte 
boundaries. That means testing available()/length()/remaining() style methods 
as well as being able to read and seek with instances of these things that are 
larger than 2g. Doing it with the actual file based ones seems bad, but maybe 
you could intercept those to work with memory so they run fast or ingloriously 
mock their inputs.

I've added {{RandomAccessReaderTest.testVeryLarge}}, it uses a fake file 
channel that just increments the position and the buffers without reading 
anything.

bq. For rate limiting is your current solution to consume buffer size bytes 
from the limiter at a time for both mmap reads and standard? And you accomplish 
this by slicing the buffer then updating the position? I don't see you setting 
the limit before slicing?

I've added the call to {{limit()}} in {{rebufferMmap()}}. I missed that 
initially. The downside is that we are rebuffering more often than needed for 
mmap disk access, however we have more accurate throttling. I've run again the 
same micro-benchmark and noticed no difference in performance.

bq. I thought NIODataInputStream had methods for reading into ByteBuffers, but 
I was wrong. It's kind of thorny to add one to RebufferingInputStream so I 
think you did the right thing putting it in FileSegmentedInputStream even 
though it's an odd concern to have in that class. Unless you have a better idea.

Shall we move {{readBytes}} from {{FileDataInput}} to {{DataInputPlus}}, at 
which point it can be implemented only by {{RebufferingInputStream}}?


bq. In SSTableReader you are adding and removing fields from files. What are 
the cross version compatibility issues with that?

It should be fixed now, see {{Version.hasBoundaries()}}. It's currently set to 
ignore boundaries for >= 3.0.


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's wri

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704570#comment-14704570
 ] 

Benedict commented on CASSANDRA-8630:
-

If you're going with an array of {{Region}} objects, you may as well just use a 
{{NavigableMap}}. The main benefit is only realised with two arrays. (I don't 
mind terribly which you do).

When it comes to thread safety, I would suggest backing it by 
{{SharedCloseableImpl}}, with the {{Tidy}} instance being retained by the 
builder, and being mutated as it's being built (completely thread safe as it 
holds a reference, so it will never be tidied before it's done). Whenever we 
build a {{SegmentedFile}} we copy a snapshot of the current state into a new 
instance that obtains a reference.



> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-20 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704551#comment-14704551
 ] 

Stefania commented on CASSANDRA-8630:
-

I've rebased to 3.0 and renamed the branch to 
[8630-3.0|https://github.com/stef1927/cassandra/tree/8630-3.0]. 

I had conflicts with CASSANDRA-6230, since it introduced 
{{ChecksummedDataInput}}, a new specialization of {{AbstractDataInput}}. I 
integrated it somehow but I feel there is some duplication between this class 
and {{ChecksummedRandomAccessReader}}. However the two aren't identical either, 
so for now they are separate.

I made good progress with moving the segments to the builders and replacing the 
map with an array but this is still not complete. If you want to take a quick 
look the class doing the bulk of the work is called 
[MmappedRegions|https://github.com/stef1927/cassandra/commit/d1418ab889f60812cc866f12bf94b2360b3bb2d3#diff-88342f36d0687d3a0559fede5d158d83R33].
 Your feedback is welcome but it is far from complete, specifically I still 
need to make it into a ref counted object, since the builders don't necessarily 
survive the files, and address thread safety issues. Also, I merely extend 
segments at the moment, I make no effort to compact them. I did not go for the 
two arrays approach since the code is more readable with a single array of a 
well defined class but if you really think this makes a big difference I can 
change it.

The second batch of code review comments is also still WIP. I will post more 
details once it is complete.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704048#comment-14704048
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

bq. You would think so. But take a look at its floorEntry implementation, which 
we would need to make use of. I'm terribly disappointed whenever I look beneath 
the hood of Guava.
It's pretty crazy. Technically it doesn't copy the entire thing if you follow 
it the entire way through. But yeah (jackie)



> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Benedict
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703958#comment-14703958
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. I don't see how an array of pairs can be less indirection then a map, or 
result in less boxing unless there are parallel arrays

Right, which is the standard approach for this kind of thing in Java.

bq. There might be something to not remapping entire files every 50 megabytes 
as part of early opening, but it's definitely better as a separate task. It's 
also not clear whether it's going to be faster or just feel better.

We've had a few weird kernel level memory interactions reported, and I cannot 
shake the feeling this was related. We never tracked down the cause, but also 
did not have follow up, so it's also quite possible it was an environmental 
issue. 

However, either way, if we're rewriting it right now (which to some extent we 
have to if we're eliminating the current ugliness of multiple readers, 
"potential boundaries" etc - cleanliness scope creep, I'll admit, but when 
refactoring a bunch of classes I don't think we should miss an opportunity to 
remove dead and complicating concepts, such as the need for Iterators of 
multiple FDI, that only makes sense for MFDI) we may as well do it correctly. 
If it's noticeably more work, then sure let's leave it. But if we're changing 
the behaviour, I don't think it is worth artificially reimplementing it the 
obviously worse way (irregardless of how much worse).

bq. ImmutableSortedMap (or is it navigable?) might split the difference between 
the two approaches.

You would think so. But take a look at its {{floorEntry}} implementation, which 
we would need to make use of. I'm terribly disappointed whenever I look beneath 
the hood of Guava.

bq. In SSTableReader you are adding and removing fields from files. What are 
the cross version compatibility issues with that?

This has been discussed already, I think?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Benedict
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703606#comment-14703606
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. In what scenario would we not want to map the file with as few 2 gigabyte 
buffers as possible?

During early opening we currently remap our buffers every interval, meaning for 
a 2Gb buffer by default we will map it 20 times (plus once every 2Gb). This is 
not horrible, but I would prefer if - at least during reopening - we only 
mapped once, and each time we reopened/extended the size of the file, we just 
mapped the bit that wasn't previously mapped. Once we cross a 2Gb boundary (or 
we are opening the final copy of the file) we should certainly remap into 
contiguous 2Gb chunks.


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703597#comment-14703597
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Yes you should rebase to 3.0. We port changes forward (I learned this recently 
myself).

* MemoryInputStream.available() can wrap the addition between 
buffer.remaining() + Ints.saturatedCast(memRemaining()). Do the addition and 
then the saturating cast.
* Why does RandomAccessReader accept a builder and a parameter for initializing 
the buffer? Seems like we lose the bonus of a builder a builder allowing a 
constant signature.
* A nit in initializeBuffer, it does firstSegment.value().slice() which implies 
you want a subset of the buffer? duplicate() makes it obvious there is no such 
concern.
* I think there is a place for unit tests stressing the 2 gigabyte boundaries. 
That means testing available()/length()/remaining() style methods as well as 
being able to read and seek with instances of these things that are larger than 
2g. Doing it with the actual file based ones seems bad, but maybe you could 
intercept those to work with memory so they run fast or ingloriously mock their 
inputs.
* For rate limiting is your current solution to consume buffer size bytes from 
the limiter at a time for both mmap reads and standard? And you accomplish this 
by slicing the buffer then updating the position? I don't see you setting the 
limit before slicing?
* I thought NIODataInputStream had methods for reading into ByteBuffers, but 
was wrong. It's kind of thorny to add one to RebufferingInputStream so I think 
you did the right thing putting it in FileSegmentedInputStream even though it's 
an odd concern to have in that class. Unless you have a better idea.

Stefania is your rework of segment handling still in progress? IOW should I 
hold off until you are done.

[~benedict] In what scenario would we not want to map the file with as few 2 
gigabyte buffers as possible?

I am still digesting the segments/boundaries/mapping issues.




> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702906#comment-14702906
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. I don't think we serialize the version. 

The version is in the sstable descriptor, and can be passed through

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702774#comment-14702774
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. Right, but we'll need to deserialize the information still (and just throw 
it away)

I don't think we serialize the version. However, they are serialized last so in 
theory it should be OK to just remove them. 

Thanks for explaining the map vs array benefits.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702755#comment-14702755
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. They are serialized in the summary but here we don't need to keep backward 
compatibility as it will be recreated on the flight, correct?

Right, but we'll need to deserialize the information still (and just throw it 
away)

bq. You prefer the binary search due to the cost of the map?

Primarily in lookup time. The algorithmic complexity is the same, but the 
constant factor is much larger with a TreeMap, as you have at least two cache 
misses for each decision, whereas with a primitive binary search, we'll have 
between 0 and 1. There's also no boxing required for the parameter. There are 
other benefits, but none that are likely meaningful here, and even this isn't 
super important, it's just something that has bugged me. _If_ we go with 
mapping once, on first use (i.e. never remapping the same region, as we 
currently do), we may end up with many more segments (one per 50Mb or so, by 
default), and so efficiency would be a little more important.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702744#comment-14702744
 ] 

Stefania commented on CASSANDRA-8630:
-

The boundaries are gone. They are serialized in the summary but here we don't 
need to keep backward compatibility as it will be recreated on the flight, 
correct?

bq. In our SegmentedFileBuilder we can just map a segment on-demand (i.e. 
whenever we build a SegmentedFile from it, we map any segments we need). We can 
stick with the TreeMap (although tbh, I'd prefer we switch to 
a paired long[] and ByteBuffer[], and perform binarySearch on the former to key 
into the latter), it's just built with arbitrary boundaries.

I see what you mean now. So the segments will belong to the builder like the 
channel. The reason I chose the tree map was to avoid changing the 
CompressedRAR but it shouldn't take long to change the map to an array of 
(Long, ByteBuffer) pairs. You prefer the binary search due to the cost of the 
map?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702720#comment-14702720
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. He also added Constructor is private, maybe a rate limiter with a huge 
rate?. 

Sorry, missed that :)

bq. How would we handle files bigger than Integer.MAX_SIZE?

In our SegmentedFileBuilder we can just "map" a segment on-demand (i.e. 
whenever we build a SegmentedFile from it, we map any segments we need). We can 
stick with the TreeMap (although tbh, I'd prefer we switch to 
a paired long[] and ByteBuffer[], and perform binarySearch on the former to key 
into the latter), it's just built with arbitrary boundaries.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702714#comment-14702714
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. I think Ariel was suggesting a new class that explicitly performs no work. 
However, since we use this class more often for reads than we do for 
compaction, I would prefer we stick with the more performant option of just 
null checking. Certainly using a full-fat RateLimiter is more expensive than 
this

He also added _Constructor is private, maybe a rate limiter with a huge rate?_. 
Anyway, I will just stick to null checking unless any other objection.

Thanks for the clarifications on the segments creation, I hadn't realized that 
we could get rid of the boundaries as well. One thing is still not clear 
however:

bq. At the same time we can eliminate the idea of multiple segments; we should 
always have just one segment.

How would we handle files bigger than Integer.MAX_SIZE? Would we map the new 
region on-the-fly when rebuffering (I guess not) or upfront when building the 
'segmented' file, in which case we still need more than one mmap segment? 

We also need to rename {{SegmentedFile}} and derived classes right? Any 
preferences?



> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702681#comment-14702681
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. RateLimiter is not a final class. 

I think Ariel was suggesting a new class that explicitly performs no work. 
However, since we use this class more often for reads than we do for 
compaction, I would prefer we stick with the more performant option of just 
null checking. Certainly using a full-fat RateLimiter is more expensive than 
this

bq. Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and 
createSegments()

I should have written a bit about this before work started: my expectation is 
that this can all be completely removed. The reason for it was that we treated 
each mmap file segment as completely distinct, so we had to have each partition 
end on a 2G boundary (so we could map the entirety). That's no longer the case, 
since we just rebuffer, so we can safely eliminate all of the mess with segment 
boundaries, and just map in increments of 2G (or, frankly, whatever we feel 
like. It might be nice to do it exactly once when we "early open" so that we do 
not remap the same regions multiple times). At the same time we can eliminate 
the idea of multiple segments; we should always have just one segment. Given 
this, we should also consider renaming them, since they're no longer "segments" 
- they cover the whole file.

(Caveat: I haven't reviewed the code directly, I'm just going off the comments)

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702662#comment-14702662
 ] 

Stefania commented on CASSANDRA-8630:
-

Thank you for the quick review and all the tips on flight recorder!

I think I've addressed most of your comments with the [latest 
commit|https://github.com/stef1927/cassandra/commit/4794b0009b363450e03fa85cc8ca2eeb403f3e37]:

bq. In RebufferingInputStream, don't throw assertion error since it's not an 
assert nor an Error to read from a closed stream. JDK classes seem to throw 
IOException with a message. I say don't throw anything just let it NPE since 
that is what the other functions do and we are avoiding the extra branch in 
those. Or... check for "closed" all the time and throw IOException. Maybe 
Benedict has an opinion on the performance of checking.

Done, it'll just NPE.

bq. RandomAccessReader.reBufferMmap() - how is mmap reading rate limited?

The throttling was kind of work in progress. The issue I was facing with the 
performance measurements is that
the limiter was aquiring the entire buffer capacity and this can be big for 
mmap segments (the entire file length up to 2GB). So, as Benedict correctly 
guessed, multiple tables at once would block when trying to swap in the first 
segment in rebuffer. Therefore I changed the RAR constructor to initialize the 
buffer to the first segment (and this won't be throttled). Then I moved the 
throttling back to reBuffer, so it covers mmap segments too, except I am 
careful to throttle on buffer.remaining() rather than buffer.capacity(). So the 
first segment won't be throttled, this could be a problem but the existing 
behavior doesn't throttle mmap segment at all. Perhaps we shouldn't be 
throttling when rebuffering but when reading?

bq. RandomAccessReader.Channel is not a channel. It's more of a wrapper, 
descriptor, proxy or something.

This was a ugly hack to support the compressed commitlog replay code, which 
expects a FileDataInput that can be created with a BB, a path and an offset 
(the old ByteBufferDataInput). I did not have the confidence to change 
commitlog code so I ended up with a channel wrapper to support an empty channel 
and reuse the RAR. I got rid of it and added a new sub-class of DataInputBuffer 
that implements FileDataInput, I called it FileSegmentInputStream.

bq. RateLimiter is not a final class. We could start using a noop rate limiter 
instead of null. Constructor is private, maybe a rate limiter with a huge rate?

Replaced null with RateLimiter.create(Double.MAX_VALUE).

bq. RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect 
the file to be bigger than an int so it shouldn't throw. The API allows this 
and FileInputStream doesn't throw for available. In this case saturated cast is 
probably the right one. We should do a pass for Ints.checkedCast and make sure 
throwing is the right behavior instead of writing handling for it.

Changed to saturatedCast, thanks for spotting this.

bq. BufferPool.java has an import change that is extra

Fixed, thank you.

bq. I was going to ask for some warnings cleanup, but it's a big patch touching 
a lot of files that already had warnings, so whatever you want to do.

I've killed a few, not too many though, if you have any specific files that 
bother you particularly let me know.

bq. Thumbs up for logging the random seed in the tests

Thanks.

bq. RandomAccessReader.readBytes could allocate the buffer and then invoke the 
superclass read method

I think it's faster as it is? This is a hot-spot according to flight recorder.

bq. MemoryInputStream.reBuffer is allegedly not tested

I've added {{MemoryTest.testInputStream()}}

bq. RandomAccessReader.reBuffer has two cases that aren't tested, if (limit > 
fileLength) and in the loop if (n < 0)

fileLength should always be less than the channel size (I've added a check in 
the builder when we override it). So {{n <0}} would only happen if the file got 
modified which shouldn't happen, therefore I replaced the {{break}} with an 
{{FSError}}. As for {{limit > fileLength}} I don't see how this could happen so 
I removed it. Since this was existing code, could you double check this?

bq. RandomAccessReader.open(ByteBuffer, String, long) usage of checked cast 
seems like it would also limit to 2 gig files?

No longer applicable.

bq. Not sure about the first checked cast in reBufferMmap, if it saturated the 
min would still work, and you would be able to do it multiple times to get to 
the next entry so no reason to throw an exception?

Yes you're quite right, saturatedCast is the correct choice.

bq. Test coverage looks excellent on the things you worked on.

Thanks.

bq. What's the business with the missing segments? How does that happen and how 
often? Just wondering if going to the buffer pool for that makes sense.

Have a look at MmappedSegmentedFile.Builder.addPotent

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702017#comment-14702017
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Some more things I noticed

* RandomAccessReader.readBytes could allocate the buffer and then invoke the 
superclass read method
* MemoryInputStream.reBuffer is allegedly not tested
* RandomAccessReader.reBuffer has two cases that aren't tested, if (limit > 
fileLength) and in the loop if (n < 0)
* RandomAccessReader.open(ByteBuffer, String, long) usage of checked cast seems 
like it would also limit to 2 gig files?
* Not sure about the first checked cast in reBufferMmap, if it saturated the 
min would still work, and you would be able to do it multiple times to get to 
the next entry so no reason to throw an exception?

Test coverage looks excellent on the things you worked on.

What's the business with the missing segments? How does that happen and how 
often? Just wondering if going to the buffer pool for that makes sense.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701858#comment-14701858
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Doing code review now

* In RebufferingInputStream, don't throw assertion error since it's not an 
assert nor an Error to read from a closed stream. JDK classes seem to throw 
IOException with a message. I say don't throw anything just let it NPE since 
that is what the other functions do and we are avoiding the extra branch in 
those. Or... check for "closed" all the time and throw IOException. Maybe 
Benedict has an opinion on the performance of checking.
* RandomAccessReader.reBufferMmap() - how is mmap reading rate limited?
* RandomAccessReader.Channel is not a channel. It's more of a wrapper, 
descriptor, proxy or something. 
* MemoryInputStream - This should let you read from Memories larger than 2 
gigabytes right? Ints.checkedCast in getByteBuffer will throw?
* RateLimiter is not a final class. We could start using a noop rate limiter 
instead of null.
* RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect the 
file to be bigger than an int so it shouldn't throw. The API allows this and 
FileInputStream doesn't throw for available. In this case saturated cast is 
probably the right one. We should do a pass for Ints.checkedCast and make sure 
throwing is the right behavior instead of writing handling for it.
* BufferPool.java has an import change that is extra
* I was going to ask for some warnings cleanup, but it's a big patch touching a 
lot of files that already had warnings, so whatever you want to do.
* Thumbs up for logging the random seed in the tests

The approach looks good. I'm still reviewing. Working on the tests and coverage 
now.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701626#comment-14701626
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

OK, I won't get too involved in trying to benchmark it then. I think we have 
demonstrated it isn't a regression with Stefania's changes to the rate limiting.

Stefania, there is a view in flight recorder that is kind of handy is Threads 
-> Latencies. Under Java Thread Sleep you can see 882 milliseconds spent 
sleeping across 98 instances. This is for the uncompressed case with 8630. That 
sleep time isn't present on trunk. If you have a single thread you want to be 
hot (or piece of code) you can check the latencies view for time spent 
parked/waiting/sleeping/IO. It's not the greatest view because it doesn't group 
by thread.

Since flight recorder mostly only accounts for CPU time it's the view you can 
use to find blocked threads that aren't blocked on contention or IO. Flight 
recorder also doesn't account for time spent faulting memory mapped files :-) 
In retrospect this would be the "flight recorder" way to accomplish the 
discovery you made in visual vm.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701566#comment-14701566
 ] 

Benedict commented on CASSANDRA-8630:
-

I guess it depends on what we're trying to demonstrate. This ticket was meant 
to be a LHF, to fix some obvious problems with RAR:

* If we're just trying to establish how much faster "sequential IO" is (as the 
ticket is targeting) we probably just want to see how quickly we can read the 
contents of an sstable from start to finish. 
** It's worth noting that this may be more impactful on 2.2 or below, as we 
made a great deal more use of readInt(), which has a terribly inefficient 
implementation. 3.0 and trunk now use readUnsignedVInt a great deal more, and 
this is considerably more efficient.
* If we're trying to establish how much faster compaction gets as a result, we 
probably want to test between 4 and 10 files, since the former is what we'll 
compact with STCS, and the latter with LCS, AFAIK.

Personally I'm only interested in quickly confirming this ticket improves the 
basic properties it's aiming for. A wider scope analysis of compaction 
performance is very much necessary, but can wait until after 3.0 ships.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701532#comment-14701532
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

So your saying that in practice a compaction task will kick off, and before it 
completes another will kick off and that allows for parallelism?

Are we measuring the wrong thing here then? Maybe we want to avoid a 100 way 
compaction and force a 4-way by using a large memtable?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701519#comment-14701519
 ] 

Benedict commented on CASSANDRA-8630:
-

Bear in mind you're performing a 100 way compaction, which is almost unheard 
of. Typical compaction is 4-way, in which scenario this is likely to dominate a 
great deal more. This is also performed on every read, not just compaction, so 
optimising this is definitely helpful. Especially given it is a relatively LHF.

bq. I kind of think we are DOA if compaction within a single column family is 
single threaded anyways

A single compaction task is single threaded. There may be multiple such 
compactions happening in parallel. We may reintroduce parallel compaction at a 
later date, but it's not something we're likely to introduce in the near future.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701515#comment-14701515
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

I guess I mixed up compressed and uncompressed?

JFR has an option to load settings from an XML file. You can export the xml 
file from JMC and then modify the fields you care about. There are some options 
in there for those thresholds that I don't know off hand. You can also set them 
in the U and then export.

I ran without rate limiting (commented out the acquire) and the original 
workload.

|Version|Run1||
|8630 uncompressed|202|
|8630 compressed|197|
|Trunk uncompressed|201|
|Trunk compressed|203|

Looking at the profile I think costs here are dominated by all the Java code 
for merging. I'll run with JMC and no rate limiting and see where that gets me. 
I suspect it's going to be very similar.

It kind of looks like a bit more than 50% of time is spent materializing data 
and maybe 40% is spent rewriting it. The top 4 classes are 3 merge iterators 
and ConcurrentHashMapV8 in concurrent linked hashmap. I guess cache maintenance 
is part of the overhead of compaction.

It looks like we are optimizing the wrong part of the profile? Have things 
changed since <2.2? I kind of think we are DOA if compaction within a single 
column family is single threaded anyways. I think I heard of such a thing 
existing, but no one was satisfied with it.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700970#comment-14700970
 ] 

Benedict commented on CASSANDRA-8630:
-

Probably due to the burst-limit of the rate limiter. If you rebuffer all 100 
sstables at once, you're perhaps exceeding the default 1s burst limit, and so 
find yourself sleeping often. Sleeping has a coarse granularity, so sleeping 
may last longer than intended by the {{RateLimiter}}.

We should probably test this with rate limiting off, anyway, though. To see 
what unthrottled throughput looks like.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700948#comment-14700948
 ] 

Stefania commented on CASSANDRA-8630:
-

The slowness with the uncompressed mmaped segments is caused by the rate 
limiter, which ultimately comes from the compaction throughput, see 
_mmaped_uncomp_hotspot.png_ attached. Whereas before we were simply looping on 
a sorted list of mmaped segments and returning a {{ByteBufferDataInput}} for 
each one of them, now we have a sorted map of segments that are swapped in or 
out by the RAR rebuffer method. Because previously we would apply the rate 
limiter to the rebuffer method, mmaped segments became much slower. 

If we apply the rate limiter only just before reading, as opposite to every 
time rebuffer is called, here are the results:

||Version||Run 1||Run 2||Run 3||Rounded AVG||
|8630 comp|17.48|16.77|16.26|17|
|8630 uncomp|15.51|17.5|17.7|17|
|TRUNK comp|17.95|17.64|17.72|18|
|TRUNK uncomp|20.81|20.01|18.81|20|

I am not sure I understand fully why the compressed case was not affected as 
much, these segments are pretty big also for the uncompressed case. I also 
would like to know if there is a way to have flight recorder look at the total 
time rather than just the CPU time, without [visual 
vm|https://visualvm.java.net] I would not have been able to find this.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700684#comment-14700684
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks for your analysis.

I repeated the tests, 3 identical runs each time, albeit with a smaller data 
set. They still indicate it is the uncompressed case where something has gone 
wrong, not the compressed case. And more specifically I traced the slowness to 
mmap disk access.

Here are the results, because I am on a 64-bit machine 
{{disk_access_mode=auto}} resolves to {{mmap}} (although I am not sure at which 
version this behavior started so it may not be true for all versions). In the 
'uncomp-std' test I forced the disk access mode to standard.

||Version||Run 1||Run 2||Run 3||Rounded AVG||
|8630 comp|17.91|18.31|17.94|18|
|8630 uncomp|28.06|28.95|28.02|28|
|8630 uncomp-std|19.31|18.09|18.9|19|
|TRUNK comp|17.95|17.64|17.72|18|
|TRUNK uncomp|20.81|20.01|18.81|20|
|2.2 comp|19.95|20.33|19.97|20|
|2.2 uncomp|19.14|19.18|20.1|19|
|2.1 comp|21.61|20.43|20.43|21|
|2.1 uncomp|20.4|19.67|19.71|20|
|2.0 comp|18.8|19.42|19.66|19|
|2.0 uncomp|19.48|19.55|19.68|20|

Notes:
* Reduced data to 1M entries, which corresponds to approximately 220 MB of 
data. This allowed me to keep the machine _more or less_ idle during the tests.
* All tests done with Java 8 update 51 except for 2.0 which was done with Java 
7 update 80.
* Tests performed on a 64-bit linux laptop with SSD
* Compaction strategy was the default strategy used by the stress tool: 
SizedTieredCompactionStrategy

Next I need to understand why mmap is so slow, I think I must have broken 
something when I moved the segments to the RAR however.

bq. I usually set the file read and write and contention thresholds to one 
millisecond.

What parameters do you use to achieve this?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700433#comment-14700433
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Flight recording of 8630 with compression, hot packages
||Package|Sample Count|Percentage(%)||
|org.apache.cassandra.db.rows|1,577|25.403|
|org.apache.cassandra.utils|1,498|24.13|
|org.apache.cassandra.utils.btree|670|10.793|
|com.googlecode.concurrentlinkedhashmap|598|9.633|
|java.util|585|9.423|
|org.apache.cassandra.io.sstable|430|6.927|
|org.apache.cassandra.db.partitions|183|2.948|
|org.apache.cassandra.cache|162|2.61|
|org.apache.cassandra.io.util|139|2.239|
|org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$$Lambda$93|77|1.24|
|org.apache.cassandra.db|74|1.192|

Flight recording trunk, hot packages
||Package|Sample Count|Percentage(%)||
|org.apache.cassandra.utils|1,771|26.732|
|org.apache.cassandra.db.rows|1,599|24.136|
|com.googlecode.concurrentlinkedhashmap|631|9.525|
|java.util|590|8.906|
|org.apache.cassandra.utils.btree|565|8.528|
|org.apache.cassandra.io.sstable|438|6.611|
|org.apache.cassandra.io.util|330|4.981|
|org.apache.cassandra.db.partitions|124|1.872|
|org.apache.cassandra.cache|121|1.826|
|org.apache.cassandra.io.sstable.format.big|105|1.585|
|org.apache.cassandra.db|102|1.54|

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700401#comment-14700401
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

||Version|Time 1|Time 2|Time 3|
|8630 uncompressed|197|204|198|
|8630 compressed|263|262|261|
|3.x uncompressed|199|198|198|
|3.x compressed|200|198|198|

My intuition is that the compressed case has something bad happening, and that 
there is no impact from the changes in the uncompressed case. That kind of 
suggests the time/bottleneck is elsewhere. I am looking at the flight 
recordings now.

Did you measure on OS X or Linux? FYI I usually set the file read and write and 
contention thresholds to one millisecond. Doesn't seem to impact performance, 
but does provide a clearer picture.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700234#comment-14700234
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

I have an empty box I can run it on. Which compaction strategy are you taking 
those numbers from? When I run the test it does it 3 times once for each 
strategy.


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699287#comment-14699287
 ] 

Benedict commented on CASSANDRA-8630:
-

Given it is about the same with compression, I suspect it may be doing more 
writes to disk. Perhaps the buffer size is smaller?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699282#comment-14699282
 ] 

Stefania commented on CASSANDRA-8630:
-

bq. Hmm. The patch is slower in both runs without compression. Was that due to 
other work on your system, or is something else to blame?

I think it's definitely slower but nothing stands out in the flight recorder 
hot-spots. Tomorrow I'll spend some more time on it and I'll also compare it 
with 2.1 and 2.2.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699202#comment-14699202
 ] 

Benedict commented on CASSANDRA-8630:
-

Hmm. The patch is slower in both runs without compression. Was that due to 
other work on your system, or is something else to blame?

I guess it's possible 8099 has either worsened the situation for read merging, 
or improved the situation for serialiaztion, so that this is less meaningful. 
It would be interesting to see how 2.1 faired by comparison!


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-17 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699151#comment-14699151
 ] 

Stefania commented on CASSANDRA-8630:
-

[~benedict] I would appreciate some feedback if you have time.

CI results:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-8630-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-8630-dtest/

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-14 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698039#comment-14698039
 ] 

Stefania commented on CASSANDRA-8630:
-

Great news, thank you so much for confirming!

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-14 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697108#comment-14697108
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

You are right it's the wrong ticket. Good news is that is that the changes I 
was thinking of are already on trunk
https://github.com/apache/cassandra/commits/trunk/src/java/org/apache/cassandra/io/util/NIODataInputStream.java

I was thinking of CASSANDRA-9863

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-13 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696455#comment-14696455
 ] 

Stefania commented on CASSANDRA-8630:
-

I don't see any changes to NIODataInputStream in 
https://github.com/apache/cassandra/compare/trunk...aweisberg:C-9500, is this 
the right ticket?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-10 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679819#comment-14679819
 ] 

Stefania commented on CASSANDRA-8630:
-

I'm still getting little-endian buffers in OHCProvider.ValueSerializer, is 
there a global setting I must set?

Also, I noticed the license file is still for 0.3.4, should we update it?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661625#comment-14661625
 ] 

Stefania commented on CASSANDRA-8630:
-

Thank you! :)

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661615#comment-14661615
 ] 

Robert Stupp commented on CASSANDRA-8630:
-

Here you go :)
ohc 0.4.1 has Java byte order for everything user-facing.
You can download the jars 
[here|https://oss.sonatype.org/content/repositories/releases/org/caffinitas/ohc/ohc-core/0.4.1/]
 and 
[here|https://oss.sonatype.org/content/repositories/releases/org/caffinitas/ohc/ohc-core-j8/0.4.1/]
  (Maven central needs some time to replicate and index).

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661569#comment-14661569
 ] 

Stefania commented on CASSANDRA-8630:
-

I see, let's stick to big-endian for now. Maybe I'll do a little benchmark to 
compare if I have some time.

Robert, can you provide the configuration switch if it isn't too much trouble? 
At the moment I change the ordering in the serializers. It works but it's not 
very nice.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661567#comment-14661567
 ] 

Robert Stupp commented on CASSANDRA-8630:
-

I'm also +1 on option 1 (stick with big endian). ByteBuffer intrinsics are 
quite good nowadays. Providing an upgrade period allowing little and big endian 
might bring more problems and hard to detect bugs than it buys.

Do you need a patched OHC version?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661560#comment-14661560
 ] 

Benedict commented on CASSANDRA-8630:
-

It would be better, from that perspective to use native order. However:

# Swapping byte order is a single cycle instruction, so is probably not a big 
deal, given how inefficient we are otherwise right now
# We would have to require that users have nodes all sharing the same 
architecture (or at least endianness), which may or may not be limiting. Or we 
could pick the common platform endianness and go little endian
# We would have to support both at least during upgrade

All told I think it's easier to stick with big endian format as is Java 
standard and standard across our codebase. But I'm not opposed to the 
alternative of providing a patch to support both, followed by an upgrade period 
and dropping support for big endian in 4.0



> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661554#comment-14661554
 ] 

Robert Stupp commented on CASSANDRA-8630:
-

OHC forces native byte order to prevent swapping the values. I can however 
provide a config switch in the builder to use big-endian if that helps.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661541#comment-14661541
 ] 

Stefania commented on CASSANDRA-8630:
-

About byte ordering, it seems OHC insists on native byte ordering, which is 
little-endian on linux x86_64. Not a big problem, we can force the ordering to 
big-endian in the serializers.

However, I think this means we always pay the price of swapping bytes when 
using direct byte buffers. Here is the implementation of {{getInt()}} in 
DirectByteBuffer.java:

{code}
private int getInt(long a) {
if (unaligned) {
int x = unsafe.getInt(a);
return (nativeByteOrder ? x : Bits.swap(x));
}
return Bits.getInt(a, bigEndian);
}
{code}

Forcing byte ordering to big-endian doesn't mean {{nativeByteOrder}] becomes 
true:

{code}
public final ByteBuffer order(ByteOrder bo) {
bigEndian = (bo == ByteOrder.BIG_ENDIAN);
nativeByteOrder =
(bigEndian == (Bits.byteOrder() == ByteOrder.BIG_ENDIAN));
return this;
}
{code}

where {{Bits.byteOrder()}} return the platform endianess. 

So wouldn't we be better off forcing native byte ordering rather than 
big-endian?


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655149#comment-14655149
 ] 

Benedict commented on CASSANDRA-8630:
-

bq. For the fast path, the built-in BB methods should still be faster, right?

Right.

bq. readByte() would result in one unsafe get call per byte.

The unsafe calls here are all intrinsics, but still - even for fully inlined 
method calls and unrolled loop we're talking something like 24x the work, but 
then we have the virtual invocation costs involved (the behaviour of which for 
a sequence of 8 identical calls I'm not certain - I would hope there is some 
sharing of the method call burden through loop unrolling, but I don't count on 
it), and we are probably 100x+ using rigorous finger-in-air maths.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-05 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655129#comment-14655129
 ] 

Stefania commented on CASSANDRA-8630:
-

Much nicer thanks. 

Do we care about little-endian ordering as well? 

For the fast path, the built-in BB methods should still be faster, right? At 
least for direct BBs, on unaligned platforms, the built-in methods use the 
unsafe get primitive methods, whereas something similar with get() rather than 
readByte() would result in one unsafe get call per byte.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654962#comment-14654962
 ] 

Benedict commented on CASSANDRA-8630:
-

I think we can do something much simpler, without any intermediate objects. e.g.

{code}
@Override
public int readInt() throws IOException
{
if (buffer.remaining() >= 4)
return buffer.getInt();  
else
return (int) readPrimitiveSlowly(4);
}

@DontInline
private long readPrimitiveSlowly(int bytes) throws IOException
{
long result = 0;
for (int i = 0 ; i < bytes ; i++)
result = (result << 8) | readByte();
return result;
}
{code}

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654789#comment-14654789
 ] 

Stefania commented on CASSANDRA-8630:
-

Thanks for the heads up on 9500.

I basically introduced a helper method to return an intermediate object for the 
slow path. I used a ByteBuffer but I can change it to use a Long. I've copied 
some sample code below.

The point is that the switching of buffers cannot have any left over bytes in 
the buffer itself, else we can't plug in the memory-mapped buffers without 
copying data into another buffer.

{code}
@Override
public int readInt() throws IOException
{
if (buffer.remaining() >= 4)
return buffer.getInt();  
else
return rebufferWithRemaining(4).getInt();
}

private ByteBuffer rebufferWithRemaining(int minimum) throws IOException
{
assert(buffer.remaining() < minimum);
byte[] b = new byte[minimum];

   // put remaining bytes in b

   reBuffer() // here the buffer must be entirely consumed

   // add missing bytes to b, 
   // throw EOFException it not enough bytes

   return ByteBuffer.wrap(b);
}
{code}

The method intended to be overwritten is {{reBuffer}}:
* The default implementation in NIODataInputStream continues reading from the 
channel without page alignment, as it does at the moment
* RAR either reads page aligned buffers or swaps in memory mapped files
* MemoryInputStream swaps in hollow byte buffers that wrap native memory.

Another choice we have is whether we are happy to keep on using the ByteBuffer 
get() methods or whether we should write our own? 


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653738#comment-14653738
 ] 

Benedict commented on CASSANDRA-8630:
-

My expectation was that we would extract the existing "is sufficient 
remaining?" check, and use this to drive a decision. It doesn't really matter 
what we do inside this block, because that is executed infrequently. I would 
have a {{long readBytesSlowly(int count)}} method, which we downcast the result 
to whatever we have actually read. That method would call {{readNext()}} as 
necessary, and would be prevented from being inlined. But that's just how I 
would do it, since I prefer not to incur complexity when we know the cost will 
be amortized away, nor inline that method and inflate our code size for the 
same reason. 

I don't think it matters _terribly_, though, and if you or [~stefania] have a 
strong opinion I won't stand in the way (unless it happens to trigger a strong 
and unexpected anti-opinion)

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653716#comment-14653716
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

Ah, so we would enhance readNext() or ensureRemaining() to handle the fact that 
it has to return the requested primitive. I didn't think of it that way. So 
would it return something like Object so that it could handle all the different 
types of primitives or would we have some specializations?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653678#comment-14653678
 ] 

Benedict commented on CASSANDRA-8630:
-

We already have a branch in every case? This won't change the typical behavior 
at all. That branch is also highly likely to be predicted correctly on just 
about every execution.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653676#comment-14653676
 ] 

Benedict commented on CASSANDRA-8630:
-

We already have a branch in every case? This won't change the typical behavior 
at all. That branch is also highly likely to be predicted correctly on just 
about every execution.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653630#comment-14653630
 ] 

Ariel Weisberg commented on CASSANDRA-8630:
---

You might want to avoid creating merge conflicts with CASSANDRA-9500 since it 
changed NIODataInputStream quite a bit. Maybe start with your commits on top of 
9500 and then rebase onto trunk later. Adding the slow path for primitives is a 
little yucky since it adds branch to every single function. Feels like we give 
up some of what we gained by going to simpler primitive reading methods.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653263#comment-14653263
 ] 

Benedict commented on CASSANDRA-8630:
-

Yes, spot on (at a high level - specifics may yet have more to discuss). 
[~aweisberg] may have comments / thoughts, since this is the direct mirror of 
CASSANDRA-9500

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-08-04 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653224#comment-14653224
 ] 

Stefania commented on CASSANDRA-8630:
-

[~benedict], sorry for the delay, I finally find the time to get back into 
this. I already moved the mmap segments into the RAR and made 
{{MemoryInputStream}} extend {{NIODataInputStream}}.

Just to make sure I understood you correctly before I carry on with the 
trickier part, making {{RandomAccessReader}} extend {{NIODataInputStream}} 
requires changing the way {{NIODataInputStream}} reads data in that we cannot 
afford to have any left over bytes in the buffer before calling {{readNext}} as 
this would not work for mmaped segments. I guess this is the whole point of the 
optimization, the fast and slow paths get implemented in {{NIODataInputStream}} 
and then the RAR just implements {{FileDataInput}} and overrides readNext() by 
either refilling the whole buffer with a page aligned read or swapping in a 
memory mapped segment. This requires the buffer in {{NIODataInputStream}} to be 
protected rather than private and not final.

Is my understanding correct?


> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-07-27 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642564#comment-14642564
 ] 

Stefania commented on CASSANDRA-8630:
-

Thank you, no need to attach your sstables, I'll try and replicate something 
similar.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-07-27 Thread Oleg Anastasyev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642495#comment-14642495
 ] 

Oleg Anastasyev commented on CASSANDRA-8630:


The test was compacting 1.2G of data in 110 compressed and uncompressed 
sstables and measuring time passed for compaction process. I am not sure 
attaching all those sstables here makes any sense.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-07-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642479#comment-14642479
 ] 

Benedict commented on CASSANDRA-8630:
-

I think we can avoid static methods by making {{MemoryInputStream}} extend 
{{ByteBufferInput}} or {{NIODataInputStream}} (this is actually pretty easy, 
since we can construct {{ByteBuffer}} from a {{Memory}} instance at very low 
cost, and can "refill"" that {{ByteBuffer}} on a {{readNext}}.

I think this should ideally be based on {{NIODataInputStream}} if that's not 
too challenging (which I hope it should not be), so that we can keep one main 
implementation of {{DataInputPlus}}. And we can indeed drop 
{{AbstractDataInput}} at that point.
 
A somewhat-circular buffer would be ideal. However we cannot currently have > 
64Kb buffers, so for reads using 64Kb already we would have to shrink our read 
size or grow our max buffer size. For smaller buffers, we will still have to 
shuffle bytes around and have quite a bit of wasted allocation (for this to 
work, our reads of objects much smaller than 4Kb would still need 8Kb buffers 
allocated).

I wonder if for primitive reads we shouldn't just have a slow path (like we've 
now opted for with vints) that reads byte-by-byte at a boundary crossing 
(returning a long that can be cast). If we force the method call to _not_ be 
inlined, the cost should be imperceptible with the help of branch prediction, 
since it should almost never be hit (given we probabilistically guarantee 
covering the entire range in our buffer), and will not pollute the icache. This 
has the added bonus of being much easier to understand, and probably easier to 
implement safely.

However if you think you can deliver a neat circular-buffer approach, feel free 
to give it a try.

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

2015-07-27 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642387#comment-14642387
 ] 

Stefania commented on CASSANDRA-8630:
-

I've attached some flight recorder files generated with the following commands:

{code}
tools/bin/cassandra-stress write n=5M -rate threads=50
nodetool compact keyspace1 standard1
tools/bin/cassandra-stress read n=5M -rate threads=50
{code}

The read and write methods are still among the top hot-spots, albeit they are 
not the only ones. 

[~m0nstermind] do you still have your original benchmarks so we can repeat them?

[~benedict] here is what I propose for the read path (haven't looked at the 
write path yet):

- Merge {{ByteBufferInput}} and {{RandomAccessReader}} into a unique class, 
let's call it something like {{FileInputStream}}. At this point we are left 
with two {{AbstractDataInput}} implementations: this new {{FileInputStream}} 
and the existing {{MemoryInputStream}}. 
- Introduce one or more helper classes that encode and decode types using 
static methods, similar to {{VIntCoding}}. These classes will operate directly 
on bytes (or buffers) and will be shared by the two input stream 
implementations ({{FileInputStream}} and {{AbstractDataInput}}). Because they 
are static and somewhat short, they have good chances of being inlined.
- Depending on how much is left, we could get rid of {{AbstractDataInput}}?
- For reads spanning two buffers, would it be too complicated to treat the 
buffer as a circular buffer, after we've consumed the first half we can already 
read the next chunk into it and so forth, where a chunk is half the buffer 
size. So the buffer size should be an even number of pages, which we can ensure 
in CASSANDRA-8894.

WDYT?

> Faster sequential IO (on compaction, streaming, etc)
> 
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core, Tools
>Reporter: Oleg Anastasyev
>Assignee: Stefania
>  Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read and 
> SequencialWriter.write methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >