[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-05-03 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270101#comment-15270101
 ] 

Stefania commented on CASSANDRA-9766:
-

Thanks for the modifications.

It LGTM now, I just have two small suggestions and some nits:

* In {{ColumnIndex}} a method to reset the properties that change would have 
been enough, no need to pass in every time {{header, dataFile, 
descriptor.version, observers}} and probably not even 
{{getRowIndexEntrySerializer().indexInfoSerializer()}}. This way these fields 
can return to be final.  
* In BTW, {{columnIndexWriter}} can be created in the constructor so it too can 
be final. 
* Nit: {{indexSamples}} in {{ColumnIndex}} can be final.
* Nit: unused import in {{CompressedInputStream}}: {{java.util.zip.Checksum}}
* Nit: we can restore {{import java.util.\*}} in BTreeSet.java and {{import 
java.util.concurrent.\*}} in BufferPool.java

CI looks good as well:
* the 3 failing dtests also fail on trunk
* the failing utests are all timeouts except for one which also fails on trunk. 
The tests with timeout pass locally and there are timeouts on trunk too.


> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-05-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269625#comment-15269625
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

bq. ColumnIndex.create() is only called in BTW.append.

Good point I fixed this so its just re-used vs recycled.

bq. I would suggest leaving BTreeSearchIterator not recycled. 

I agree. I removed all this in the last push.  I'd like to revisit this though 
in a later ticket.

bq. dob.recycle() should be in a finally since serializeRowBody() can throw.

done.

bq. I don't understand this line

I just did it for clarity.

bq. Why do we need to allocate cells lazily in BTreeRow.Builder

The cell builder is recycled on build().  Also the Row Builder is reset() on 
build().  Sometimes the RowBuilder is re-used and sometimes its thrown away.  
So by making the cell builder lazy we ensure the recycled object is only 
allocated when the RowBuilder is re-used.


I also fixed the imports and kicked off the tests again.


> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-05-02 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268120#comment-15268120
 ] 

Stefania commented on CASSANDRA-9766:
-

It's looking much better without recycling {{BTreeSearchIterator}}:

{code}
grep ERROR 
build/test/logs/TEST-org.apache.cassandra.streaming.LongStreamingTest.log
ERROR [main] 2016-05-03 10:37:04,004 SLF4J: stderr
ERROR [main] 2016-05-03 10:37:34,737 Writer finished after 25 seconds
ERROR [main] 2016-05-03 10:37:34,738 File : 
/tmp/1462243029050-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-05-03 10:37:55,165 Finished Streaming in 20.41 seconds: 23.52 
Mb/sec
ERROR [main] 2016-05-03 10:38:15,054 Finished Streaming in 19.89 seconds: 24.14 
Mb/sec
ERROR [main] 2016-05-03 10:38:56,983 Finished Compacting in 41.93 seconds: 
23.09 Mb/sec
{code}

I would suggest leaving {{BTreeSearchIterator}} not recycled. I think it is 
quite dangerous to recycle this iterator, see for example 
[here|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-81fd7ce7915c147ea84590e25f77ca47R361].
 I think we would extend the scope and risk of this patch significantly for 
very little gain but feel free to prove me wrong if you want to experiment with 
alternative recycling options. 

Regarding using our own {{FastThreadLocal}} vs. keeping dependencies to Netty, 
I'm really not sure. On one hand I don't want to cause additional work for no 
good reason and I don't particularly like duplicating code, but on the other 
hand the Netty internal classes, e.g. {{InternalThreadLocalMap}}, could change 
at any time. So we could have performance regressions by upgrading Netty for 
example. I'm happy either way.

Regarding ref. counting, you're quite right we don't need this, if an object is 
not recycled it will be GC-ed.

A few more points:

* Why do we need to allocate cells lazily in {{BTreeRow.Builder}}, do we really 
create many of these without ever adding cells to them?

* 
[{{dob.recycle()}}|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-c06541855022eca5fd794dd24ff02f89R182]
 should be in a finally since {{serializeRowBody()}} can throw.

* I don't understand [this 
line|https://github.com/apache/cassandra/compare/trunk...tjake:faster-streaming#diff-ee37e803d70421ce823d42e02620d589R207]:
 when the object is recycled, the buffer should be null (from close()) and 
indexSamplesSerializedSize should be zero (from create()), so why do we need to 
set {{indexOffsets\[columnIndexCount\] = 0}} explicitly?

* {{ColumnIndex.create()}} is only called in BTW.append. It would be nice if we 
could somehow attach this object somewhere rather than constantly pushing it 
and popping it from the recycler stack. We could just store it in BTW if we 
could be sure that BTW.append is not called by multiple threads or maybe have a 
queue of these objects in BTW?

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-05-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267533#comment-15267533
 ] 

Benedict commented on CASSANDRA-9766:
-

It certainly was the intention that the FastThreadLocal be used by C* back when 
I wrote it, although my intention had been to move it in-tree to maintain 
ourselves as our goals are a bit different to Netty's.  The Recycler is a good 
example - it's designed to permit low overhead but _concurrent_ recycling, with 
some guarantees about not behaving badly in the face of user misuse.  This is 
probably overkill for our use case, but it is somewhat general purpose and 
those guarantees might well be nice.  I have no strong opinion.

On the topic of BTreeSearchIterator: In 3.0, for most users the 
BTreeSearchIterator is likely to be a single very small object, and even for 
large partitions it will still be a very small number.  I would be surprised if 
recycling such small objects can easily be made a win, but if they really have 
such a large heap impact having a truly thread local queue with a small maximum 
number of elements per thread would be the only way to get close to 
non-recycled performance.

Reusing the Object[] for our builders is definitely a good thing to do though.

Disclaimer: I have not read the patch.

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-05-02 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267478#comment-15267478
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

bq. Running LongStreamingTest on my laptop went from 24/25 seconds on trunk 
HEAD to 22/23 seconds with the patch applied.

Hmm, looks like the BtreeSearchIterator recycling is causing too high a CPU hit 
to be worth the GC savings.  I've pushed a quick commit which brings the test 
back down to 19 seconds for me, could you try it out and let me what you see? 
Without recycling BTreeSearchIterator accounts for >25% of the heap pressure :(

I think since the object is so hotly used it just causes too much contention on 
the recycler. It's important to avoid too much allocation but seems like in 
this case it's gone too far.  Perhaps we can avoid the recycler here and just 
keep a reusable BTreeSearchIterator in the SSTableWriter. 

bq. I would like to make sure this is justifiable and I would probably want the 
opinion of one more committer with more experience than me
The FastThreadLocal changes were optimization by [~benedict] from a [while 
back|https://github.com/netty/netty/pull/2504] plus some recycler changes.
since we already use netty and it's built to be used as a general library it 
seemed like a good place to start. 

bq. do we have a micro benchmark comparing Netty FastThreadLocal and the JDK 
ThreadLocal? 
The netty FastThreadLocal microbenchmarks show a significant throughput 
increase over jdk

{code}
BenchmarkMode  Cnt  Score  Error  
Units
FastThreadLocalBenchmark.fastThreadLocalthrpt   20  55452.027 ±  725.713  
ops/s
FastThreadLocalBenchmark.jdkThreadLocalGet  thrpt   20  35481.888 ± 1471.647  
ops/s
{code}

bq. Should we perhaps make recyclable objects ref counted, at least for 
debugging purposes when Ref.DEBUG_ENABLED is true?

The reason I didn't do this and one reason I like the Recycler is it's not 
strictly required to recycle every object. If we added ref counting it would 
force every code path to be properly cleaned up even when we don't care about 
recycling. 



> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-04-27 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261622#comment-15261622
 ] 

Stefania commented on CASSANDRA-9766:
-

There are lots of interesting ideas in this patch but also a lot to digest and 
so I will need to revisit it again. So far, these are my initial observations:

* Running {{LongStreamingTest}} on my laptop went from 24/25 seconds on trunk 
HEAD to 22/23 seconds with the patch applied. So not quite 25% improvement 
unfortunately. I wonder if the reason is because I'm running on an hybrid HDD 
rather than SSD. Would it be possible to collect a few runs and report the 
average and standard deviation? Flight Recorder profiles for trunk and for the 
patch would also be useful. I've put the full output of my test runs at end for 
your reference.

* At the moment we have dependencies on Netty where we would expect to find 
them: in the transport package, {{NativeTransportService}}, the 
{{QueryOptions}} and {{ResultSet}} codecs and {{JavaDriverClient}}. With this 
patch, we will introduce dependencies on Netty {{FastThreadLocal}} and 
{{Recycler}} pretty much everywhere in Cassandra. Before we do this, I would 
like to make sure this is justifiable and I would probably want the opinion of 
one more committer with more experience than me. To start with however, do we 
have a micro benchmark comparing Netty {{FastThreadLocal}} and the JDK 
{{ThreadLocal}}? I'm also not convinced that the Netty recycler is as optimized 
as it can be. I understand that it can be very time consuming to implement an 
optimized pool of objects, but perhaps we should at least produce something 
quickly based on {{ThreadLocal}} and benchmark it against the Netty recycler, 
unless we already have sufficient evidence in favor of the Netty recycler.

* Should we perhaps make recyclable objects ref counted, at least for debugging 
purposes when {{Ref.DEBUG_ENABLED}} is true?

Here are some nits:

* I don't think {{wrap}} in {{ClosableIterable}} is used anywhere.
* In {{StreamingHistogram}} at line 75, {{LongAdder}} also doesn't seem used.
* Most imports with wildcards were expanded, I'm not sure if we care about this 
and if we are in favor of one approach or the other.

That's it for now, I hope to have more detailed observations during the next 
pass.


Here is the output of running {{LongStreamingTest}} on my laptop:

{code}
Run from Intellij:
==

With patch:
===

ERROR 04:07:38 Writer finished after 28 seconds
ERROR 04:07:38 File : /tmp/1461816430211-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR 04:08:01 Finished Streaming in 23.32 seconds: 21.62 Mb/sec
ERROR 04:08:24 Finished Streaming in 22.13 seconds: 22.77 Mb/sec
ERROR 04:09:06 Finished Compacting in 42.16 seconds: 23.91 Mb/sec


Without patch:
===

ERROR 04:13:13 Writer finished after 27 seconds
ERROR 04:13:13 File : /tmp/1461816765852-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR 04:13:38 Finished Streaming in 24.87 seconds: 19.63 Mb/sec
ERROR 04:14:02 Finished Streaming in 24.17 seconds: 20.19 Mb/sec
ERROR 04:14:43 Finished Compacting in 41.32 seconds: 23.82 Mb/sec

Run from the command line:
==

With patch:
===
ERROR [main] 2016-04-28 12:25:12,394 Writer finished after 28 seconds
ERROR [main] 2016-04-28 12:25:12,395 File : 
/tmp/1461817483899-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-04-28 12:25:35,122 Finished Streaming in 22.73 seconds: 21.83 
Mb/sec
ERROR [main] 2016-04-28 12:25:57,284 Finished Streaming in 22.16 seconds: 22.38 
Mb/sec
ERROR [main] 2016-04-28 12:26:38,817 Finished Compacting in 41.53 seconds: 
24.08 Mb/sec


Without patch:
==
ERROR [main] 2016-04-28 12:19:51,580 Writer finished after 26 seconds
ERROR [main] 2016-04-28 12:19:51,580 File : 
/tmp/1461817165548-0/cql_keyspace/table1/ma-1-big-Data.db
ERROR [main] 2016-04-28 12:20:17,042 Finished Streaming in 25.46 seconds: 19.17 
Mb/sec
ERROR [main] 2016-04-28 12:20:41,087 Finished Streaming in 24.04 seconds: 20.30 
Mb/sec
ERROR [main] 2016-04-28 12:21:22,610 Finished Compacting in 41.52 seconds: 
23.51 Mb/sec
{code}

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we

[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-04-27 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260982#comment-15260982
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

[testall | 
https://cassci.datastax.com/view/Dev/view/tjake/job/tjake-faster-streaming-testall/]
[dtest | 
https://cassci.datastax.com/view/Dev/view/tjake/job/tjake-faster-streaming-dtest/]

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 3.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-04-26 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259448#comment-15259448
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

I've finished the allocation work I wanted to do and now I'm seeing 50% less 
allocation during the test with a 25% improvement in throughput compared to 
trunk.

There are a few tests still failing which I will fix tomorrow.

[branch|https://github.com/tjake/cassandra/tree/faster-streaming]


> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-04-13 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239330#comment-15239330
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

 I've taken a look at the streaming path and have been able to improve 
Streaming performance 25% with the following:

* CompressedStreamReader had only implemented reading a single byte at a time. 
I added the read(byte[], int, int) method.

* When writing to sstable, rather than calculate size of row, write the size 
then write the row (which causes 2x the cpu). Serialize the row to memory then 
write size of memory buffer and copy buffer to disk.

* Added object recycling of the largest garbage sources. Namely, 
BTreeSearchIterator, and DataOutputBuffer (for above fix). There are still a 
few more places recycling would help/ like the Object[] in BTree.Builder

* Changed all ThreadLocals to use FastThreadLocal from netty, and subsequently 
adding FastThreadLocalThreads for all internal threads.

There are still more things todo here, like we generate tons of garbage boxing 
types for StreamingHistogram.

I've added a long test to stream and compact a large sstable.

Branch is https://github.com/tjake/cassandra/tree/faster-streaming

3.5 test:
Finished Streaming in 25 seconds: 18.92 Mb/sec

branch test:
Finished Streaming in 19 seconds: 25.04 Mb/sec

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: T Jake Luciani
>  Labels: performance
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-04-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226995#comment-15226995
 ] 

T Jake Luciani commented on CASSANDRA-9766:
---

Perhaps we add a decompression pool so we can offload the work to many cores

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>  Labels: performance
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-02-16 Thread Eric Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149333#comment-15149333
 ] 

Eric Evans commented on CASSANDRA-9766:
---

So if I'm understanding this correctly (and I'm probably not), increasing the 
receiving buffer would get more of the data over the wire before blocking on 
the read from buffer, decompression, etc (up to however much the buffer was 
increased by).  Is that right?  If so, that wouldn't really help much; That 
would seem to imply that processing the compressed data is the bottleneck, and 
that the blocking is (rightfully) applying back-pressure to the network-side.

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2016-02-16 Thread Eric Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149229#comment-15149229
 ] 

Eric Evans commented on CASSANDRA-9766:
---

I'm seeing something similar here; I get an eerily consistent 4.5MB/s _per 
stream_, (much less than the stream throughput limit, and the capability of the 
network).  We have large partitions, large SSTables, and a mixture of 256k and 
512k chunk lengths.

[~yukim] what would be the best test of this, would 
https://gist.github.com/eevans/81f02849eab7634871c9 do?

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2015-09-30 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939147#comment-14939147
 ] 

Yuki Morishita commented on CASSANDRA-9766:
---

Both does the same for compressed SSTables.
Bootstrap can send whole SSTable which is much larger than receiving buffer, 
while repair sends only part of SSTable that is different.

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2015-09-30 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939129#comment-14939129
 ] 

Jonathan Ellis commented on CASSANDRA-9766:
---

Is that actually a different path for bootstrap than repair, or do you think 
the reported behavior is inaccurate?

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2015-09-22 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903459#comment-14903459
 ] 

Yuki Morishita commented on CASSANDRA-9766:
---

Sorry for late update.
I think I found the bottle neck.

To make sure:

Are you using SSTable compression?
What is the range of SSTable size?
How large is your partition?

When reading compressed data from network, receiving node buffers upto 1024 
compressed chunks(compression chunk size is default 64kb), and putting into 
buffer can be blocked until read from buffer, decompress, calculate / update 
various stats and write received partition to SSTable files.
(Block can happen here: 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/streaming/compress/CompressedInputStream.java#L180)

One possible solution is to change hardcoded buffer length to tunable using 
yaml or system property.

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2015-07-16 Thread Alexei K (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630690#comment-14630690
 ] 

Alexei K commented on CASSANDRA-9766:
-

Hi Yuki,
 Yes we are using vnodes, and no we don't use Cassandra indexes at all.

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9766) Bootstrap outgoing streaming speeds are much slower than during repair

2015-07-16 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630097#comment-14630097
 ] 

Yuki Morishita commented on CASSANDRA-9766:
---

Are you using vnodes?
Do you use secondary indexes heavily?

> Bootstrap outgoing streaming speeds are much slower than during repair
> --
>
> Key: CASSANDRA-9766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9766
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.2. more details in the pdf attached 
>Reporter: Alexei K
>Assignee: Yuki Morishita
>Priority: Minor
> Fix For: 2.1.x
>
> Attachments: problem.pdf
>
>
> I have a cluster in Amazon cloud , its described in detail in the attachment. 
> What I've noticed is that we during bootstrap we never go above 12MB/sec 
> transmission speeds and also those speeds flat line almost like we're hitting 
> some sort of a limit ( this remains true for other tests that I've ran) 
> however during the repair we see much higher,variable sending rates. I've 
> provided network charts in the attachment as well . Is there an explanation 
> for this? Is something wrong with my configuration, or is it a possible bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)