date:20140323


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-6746:
---

Attachment: buffered-io-tweaks.patch

[~enigmacurry] Here is a patch (rebased with the latest cassandra-2.1 branch) 
which should improve the warm up period (it does on my SSD machine), what it 
does is simple - sets all RAR to FADV_RANDOM (whole file), when 
SegmentedFile.getSegment(position) is called on PoolingSegmentedFile (which is 
enabled by setting 'disk_access_mode: standard' in cassandra.yaml) it would 
mark first buffer, 64KB by default, as sequential area and do FADV_WILLNEED on 
the first page starting from position, that works as kind of of smart 
read-ahead (if we discard they idea that we already thashing by polling 64KB to 
read one small row). Can you please test it on your HDD machines to see if that 
actually works in the environment with higher I/O latencies?... Another useful 
test would be to test this code in mixed write/read mode to effectively check 
how good is page replacement mechanism in the kernel :) 

P.S. please set device read-ahead (blockdev --setra ...) back to it's default 
value before doing the tests.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944388#comment-13944388
 ] 

Pavel Yaskevich edited comment on CASSANDRA-6746 at 3/23/14 9:31 AM:
-

[~enigmacurry] Here is a patch (rebased with the latest cassandra-2.1 branch) 
which should improve the warm up period (it does on my SSD machine), what it 
does is simple - sets all RAR to FADV_RANDOM (whole file), when 
SegmentedFile.getSegment(position) is called on PoolingSegmentedFile (which is 
enabled by setting 'disk_access_mode: standard' in cassandra.yaml) it would 
mark first buffer, 64KB by default, as sequential area and do FADV_WILLNEED on 
the first page starting from position, that works as kind of of smart 
read-ahead (if we discard they idea that we already thashing by polling 64KB to 
read one small row) because getSegment(position) for buffered files points to 
the start of the row. Can you please test it on your HDD machines to see if 
that actually works in the environment with higher I/O latencies?... Another 
useful test would be to test this code in mixed write/read mode to effectively 
check how good is page replacement mechanism in the kernel :) 

P.S. please set device read-ahead (blockdev --setra ...) back to it's default 
value before doing the tests.


was (Author: xedin):
[~enigmacurry] Here is a patch (rebased with the latest cassandra-2.1 branch) 
which should improve the warm up period (it does on my SSD machine), what it 
does is simple - sets all RAR to FADV_RANDOM (whole file), when 
SegmentedFile.getSegment(position) is called on PoolingSegmentedFile (which is 
enabled by setting 'disk_access_mode: standard' in cassandra.yaml) it would 
mark first buffer, 64KB by default, as sequential area and do FADV_WILLNEED on 
the first page starting from position, that works as kind of of smart 
read-ahead (if we discard they idea that we already thashing by polling 64KB to 
read one small row). Can you please test it on your HDD machines to see if that 
actually works in the environment with higher I/O latencies?... Another useful 
test would be to test this code in mixed write/read mode to effectively check 
how good is page replacement mechanism in the kernel :) 

P.S. please set device read-ahead (blockdev --setra ...) back to it's default 
value before doing the tests.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-patched.png, 
 6746.blockdev_setra.full.png, 6746.blockdev_setra.zoomed.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

Bartłomiej Romański created CASSANDRA-6908:
--

 Summary: Dynamic endpoint snitch destabilizes cluster under heavy 
load
 Key: CASSANDRA-6908
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
 Project: Cassandra
  Issue Type: Improvement
Reporter: Bartłomiej Romański


We observe that with dynamic snitch disabled our cluster is much more stable 
than with dynamic snitch enabled.

We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB RAM, 
2x480 GB SSD). We mostly do reads (about 300k/s).

We use Astyanax on client side with TOKEN_AWARE option enabled. It 
automatically direct read queries to one of the nodes responsible the given 
token.

In that case with dynamic snitch disabled Cassandra always handles read 
locally. With dynamic snitch enabled Cassandra very often decides to proxy the 
read to some other node. This causes much higher CPU usage and produces much 
more garbage what results in more often GC pauses (young generation fills up 
quicker). By much higher and much more I mean 1.5-2x.

I'm aware that higher dynamic_snitch_badness_threshold value should solve that 
issue. The default value is 0.1. I've looked at scores exposed in JMX and the 
problem is that our values seemed to be completely random. They are between 
usually 0.5 and 2.0, but changes randomly every time I hit refresh.

Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something like 
that, but the result will be similar to simply disabling the dynamic switch at 
all (that's what we done).

I've tried to understand what's the logic behind these scores and I'm not sure 
if I get the idea...

It's a sum (without any multipliers) of two components:

- ratio of recent given node latency to recent average node latency

- something called 'severity', what, if I analyzed the code correctly, is a 
result of BackgroundActivityMonitor.getIOWait() - it's a ratio of iowait CPU 
time to the whole CPU time as reported in /proc/stats (the ratio is multiplied 
by 100)

In our case the second value is something around 0-2% but varies quite heavily 
every second.

What's the idea behind simply adding this two values without any multipliers 
(e.g the second one is in percentage while the first one is not)? Are we sure 
this is the best possible way of calculating the final score?

Is there a way too force Cassandra to use (much) longer samples? In our case we 
probably need that to get stable values. The 'severity' is calculated for each 
second. The mean latency is calculated based on some magic, hardcoded values 
(ALPHA = 0.75, WINDOW_SIZE = 100). 

Am I right that there's no way to tune that without hacking the code?

I'm aware that there's dynamic_snitch_update_interval_in_ms property in the 
config file, but that only determines how often the scores are recalculated not 
how long samples are taken. Is that correct?

To sum up, It would be really nice to have more control over dynamic snitch 
behavior or at least have the official option to disable it described in the 
default config file (it took me some time to discover that we can just disable 
it instead of hacking with dynamic_snitch_badness_threshold=1000).

Currently for some scenarios (like ours - optimized cluster, token aware 
client, heavy load) it causes more harm than good.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-6909) A way to expire columns without converting to tombstones

Bartłomiej Romański created CASSANDRA-6909:
--

 Summary: A way to expire columns without converting to tombstones
 Key: CASSANDRA-6909
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6909
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Bartłomiej Romański


Imagine the following scenario. 

- You need to store some data knowing that you will need them only for limited 
time (say 7 days).
- After that you just don't care. You don't need them to be returned in the 
queries, but if they are returned that's not a problem at all - you won't look 
at them anyway.
- You records are small. Row keys and column names are even longer than the 
actual values (e.g. ints vs strings).
- You reuse rows. You add some new columns to most of the rows every day or 
two. This means that columns expire often, rows usually not.
- You generate a lot of data and want to make sure that expired records do not 
consume disk space for too long.

Current TTL feature do not handle that situation well. When compaction finally 
decides that it's worth to compact the given sstable it won't simply get rid of 
expired columns. Instead it will transform them into tombstones. In case of 
small values that's not a saving at all.

Even if you set grace period to 0 tombstones cannot be removed too early 
because some other sstable can still have values that should be covered by 
this tombstone. 

You can get rid of tombstone only in two cases:

- it's a major compaction (never happens with LCS, requires a lot of space in 
STCS)
- bloom filters tell you that there are no others sstable with this row key

The second option is not common if you usually have multiple columns in a 
single row that was written not at once. It's a great chance you'll have your 
row spread across multiple sstables. And from time to time a new ones are 
generated. There's very little chance they'll all meet in one compaction at 
some point. 

What's funny, bloom filters returns true if there's a tombstone for the given 
row in the given sstable. So you won't remove tombstones during compaction, 
because there's some other tombstone in another sstable for that row :/

After a while, you end up with a lot of tombstones (majority of your data) and 
can do nothing about that.

Now image that Cassandra knows that we just don't care about data older than 7 
days. 

Firstly, it can simply drop such columns during compactions (without converting 
them to tombstones or anything like that).

Secondly, if it detects an sstable older than 7 days it can safely remove it at 
all (it cannot contain any active data).

These two *guarantee* that you data will be removed after 14 days (2xTTL). If 
do compaction after 7 days, expired data will be removed. If we not, whole 
sstable will be removed after another 7 days.

That's what I expected from CASSANDRA-3974, but it turned out to be a just 
trivial, frontend feature. 

I suggest to rethink this mechanism. I don't believe that it's a common 
scenario that someone who sets TTL for whole CF need all this strong guarantees 
that data will not reappear in the future in case of some issues with 
consistency (that's why we need this whole mess with tombstones). 

I believe common case with per-CF TTL is that you just want an efficient way of 
recover you disk space (and improve reads performance by having less sstables 
and less data in general).

To work around this we currently periodically stop Cassandra, simply remove too 
old sstables, and start it back. Works OK, but does not solve problem fully (if 
tombstone is rewritten by compactions often, we will never remove it).




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6909) A way to expire columns without converting to tombstones

[
https://issues.apache.org/jira/browse/CASSANDRA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bartłomiej Romański updated CASSANDRA-6909:
---

Description:
Imagine the following scenario.

- You need to store some data knowing that you will need them only for limited
time (say 7 days).
- After that you just don't care. You don't need them to be returned in the
queries, but if they are returned that's not a problem at all - you won't look
at them anyway.
- You records are small. Row keys and column names are even longer than the
actual values (e.g. ints vs strings).
- You reuse rows. You add some new columns to most of the rows every day or
two. This means that columns expire often, rows usually not.
- You generate a lot of data and want to make sure that expired records do not
consume disk space for too long.

Current TTL feature do not handle that situation well. When compaction finally
decides that it's worth to compact the given sstable it won't simply get rid of
expired columns. Instead it will transform them into tombstones. In case of
small values that's not a saving at all.

Even if you set grace period to 0 tombstones cannot be removed too early
because some other sstable can still have values that should be covered by
this tombstone.

You can get rid of tombstone only in two cases:

- it's a major compaction (never happens with LCS, requires a lot of space in
STCS)
- bloom filters tell you that there are no others sstable with this row key

The second option is not common if you usually have multiple columns in a
single row that was written not at once. It's a great chance you'll have your
row spread across multiple sstables. And from time to time a new ones are
generated. There's very little chance they'll all meet in one compaction at
some point.

What's funny, bloom filters returns true if there's a tombstone for the given
row in the given sstable. So you won't remove tombstones during compaction,
because there's some other tombstone in another sstable for that row :/

After a while, you end up with a lot of tombstones (majority of your data) and
can do nothing about that.

Now image that Cassandra knows that we just don't care about data older than 7
days.

Firstly, it can simply drop such columns during compactions (without converting
them to tombstones or anything like that).

Secondly, if it detects an sstable older than 7 days it can safely remove it at
all (it cannot contain any active data).

These two *guarantee* that you data will be removed after 14 days (2xTTL). If
we do compaction after 7 days, expired data will be removed. If we not, whole
sstable will be removed after another 7 days.

That's what I expected from CASSANDRA-3974, but it turned out to be a just
trivial, frontend feature.

I suggest to rethink this mechanism. I don't believe that it's a common
scenario that someone who sets TTL for whole CF need all this strong guarantees
that data will not reappear in the future in case of some issues with
consistency (that's why we need this whole mess with tombstones).

I believe common case with per-CF TTL is that you just want an efficient way of
recover you disk space (and improve reads performance by having less sstables
and less data in general).

To work around this we currently periodically stop Cassandra, simply remove too
old sstables, and start it back. Works OK, but does not solve problem fully (if
tombstone is rewritten by compactions often, we will never remove it).

was:
Imagine the following scenario.

Even if you set grace period to 0 tombstones cannot be removed too early
because some other sstable can still have values that should be covered by
this tombstone.

You can get rid of tombstone only in two cases:

- it's a major compaction (never happens with LCS, requires a lot of space in
STCS)
- bloom filters tell you that there are no others sstable with this row key

The second option is not common if you usually have multiple columns

[jira] [Updated] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

[
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bartłomiej Romański updated CASSANDRA-6908:
---

Component/s: Core
Config

Dynamic endpoint snitch destabilizes cluster under heavy load
-

Key: CASSANDRA-6908
URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
Project: Cassandra
Issue Type: Improvement
Components: Config, Core
Reporter: Bartłomiej Romański

We observe that with dynamic snitch disabled our cluster is much more stable
than with dynamic snitch enabled.
We've got a 15 nodes cluster with pretty strong machines (2xE5-2620, 64 GB
RAM, 2x480 GB SSD). We mostly do reads (about 300k/s).
We use Astyanax on client side with TOKEN_AWARE option enabled. It
automatically direct read queries to one of the nodes responsible the given
token.
In that case with dynamic snitch disabled Cassandra always handles read
locally. With dynamic snitch enabled Cassandra very often decides to proxy
the read to some other node. This causes much higher CPU usage and produces
much more garbage what results in more often GC pauses (young generation
fills up quicker). By much higher and much more I mean 1.5-2x.
I'm aware that higher dynamic_snitch_badness_threshold value should solve
that issue. The default value is 0.1. I've looked at scores exposed in JMX
and the problem is that our values seemed to be completely random. They are
between usually 0.5 and 2.0, but changes randomly every time I hit refresh.
Of course, I can set dynamic_snitch_badness_threshold to 5.0 or something
like that, but the result will be similar to simply disabling the dynamic
switch at all (that's what we done).
I've tried to understand what's the logic behind these scores and I'm not
sure if I get the idea...
It's a sum (without any multipliers) of two components:
- ratio of recent given node latency to recent average node latency
- something called 'severity', what, if I analyzed the code correctly, is a
result of BackgroundActivityMonitor.getIOWait() - it's a ratio of iowait
CPU time to the whole CPU time as reported in /proc/stats (the ratio is
multiplied by 100)
In our case the second value is something around 0-2% but varies quite
heavily every second.
What's the idea behind simply adding this two values without any multipliers
(e.g the second one is in percentage while the first one is not)? Are we sure
this is the best possible way of calculating the final score?
Is there a way too force Cassandra to use (much) longer samples? In our case
we probably need that to get stable values. The 'severity' is calculated for
each second. The mean latency is calculated based on some magic, hardcoded
values (ALPHA = 0.75, WINDOW_SIZE = 100).
Am I right that there's no way to tune that without hacking the code?
I'm aware that there's dynamic_snitch_update_interval_in_ms property in the
config file, but that only determines how often the scores are recalculated
not how long samples are taken. Is that correct?
To sum up, It would be really nice to have more control over dynamic snitch
behavior or at least have the official option to disable it described in the
default config file (it took me some time to discover that we can just
disable it instead of hacking with dynamic_snitch_badness_threshold=1000).
Currently for some scenarios (like ours - optimized cluster, token aware
client, heavy load) it causes more harm than good.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

2014-03-23 Thread Brandon Williams (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944423#comment-13944423
]

Brandon Williams commented on CASSANDRA-6908:
-

What version are you on? We removed the latency calculation somewhat recently
in 2.0.

Dynamic endpoint snitch destabilizes cluster under heavy load
-

Key: CASSANDRA-6908
URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
Project: Cassandra
Issue Type: Improvement
Components: Config, Core
Reporter: Bartłomiej Romański

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-876) Support session (read-after-write) consistency

2014-03-23 Thread Muhammad Adel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1396#comment-1396
 ] 

Muhammad Adel commented on CASSANDRA-876:
-

Is this issue still open for the latest version of Cassandra? As far as I 
understand from reading different documentations and articles about MemTables 
they are already searched for data before searching the SSTable when performing 
a query. 

 Support session (read-after-write) consistency
 --

 Key: CASSANDRA-876
 URL: https://issues.apache.org/jira/browse/CASSANDRA-876
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
  Labels: gsoc, gsoc2010
 Attachments: 876-v2.txt, CASSANDRA-876.patch


 In http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html and 
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.html Amazon 
 discusses the concept of eventual consistency.  Cassandra uses eventual 
 consistency in a design similar to Dynamo.
 Supporting session consistency would be useful and relatively easy to add: we 
 already have the concept of a Memtable (see 
 http://wiki.apache.org/cassandra/MemtableSSTable ) to stage updates in 
 before flushing to disk; if we applied mutations to a session-level memtable 
 on the coordinator machine (that is, the machine the client is connected to), 
 and then did a final merge from that table against query results before 
 handing them to the client, we'd get it almost for free.
 Of course, the devil is in the details; thrift doesn't provide any hooks for 
 session-level data out of the box, but we could do this with a threadlocal 
 approach fairly easily.  CASSANDRA-569 has some (probably out of date now) 
 code that might be useful here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-876) Support session (read-after-write) consistency

2014-03-23 Thread Muhammad Adel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1396#comment-1396
 ] 

Muhammad Adel edited comment on CASSANDRA-876 at 3/23/14 3:20 PM:
--

Is this issue still open for the latest version of Cassandra? As far as I 
understand from reading different documentations and articles about MemTables, 
They are already searched for data before searching the SSTable when performing 
a query. 


was (Author: muhammadadel):
Is this issue still open for the latest version of Cassandra? As far as I 
understand from reading different documentations and articles about MemTables 
they are already searched for data before searching the SSTable when performing 
a query. 

 Support session (read-after-write) consistency
 --

 Key: CASSANDRA-876
 URL: https://issues.apache.org/jira/browse/CASSANDRA-876
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
  Labels: gsoc, gsoc2010
 Attachments: 876-v2.txt, CASSANDRA-876.patch


 In http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html and 
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.html Amazon 
 discusses the concept of eventual consistency.  Cassandra uses eventual 
 consistency in a design similar to Dynamo.
 Supporting session consistency would be useful and relatively easy to add: we 
 already have the concept of a Memtable (see 
 http://wiki.apache.org/cassandra/MemtableSSTable ) to stage updates in 
 before flushing to disk; if we applied mutations to a session-level memtable 
 on the coordinator machine (that is, the machine the client is connected to), 
 and then did a final merge from that table against query results before 
 handing them to the client, we'd get it almost for free.
 Of course, the devil is in the details; thrift doesn't provide any hooks for 
 session-level data out of the box, but we could do this with a threadlocal 
 approach fairly easily.  CASSANDRA-569 has some (probably out of date now) 
 code that might be useful here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-6910) Better table structure display in cqlsh

2014-03-23 Thread Tupshin Harper (JIRA)

Tupshin Harper created CASSANDRA-6910:
-

 Summary: Better table structure display in cqlsh
 Key: CASSANDRA-6910
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6910
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Tupshin Harper
Priority: Minor


It should be possible to make it more immediately obvious what the structure of 
your CQL table is from cqlsh. Two minor enhancements could go a long way:
1) If there are no results display the column headers anyway. Right now, if you 
are trying to do a query and get no results, it's common to need to display the 
table schema to figure out what you did wrong. Having the columns displayed 
whenever you do a query wouldn't get in the way, and would provide a more 
visual way than by describing the table.

2) Along with the first one, if we could highlight the partition/clustering 
columns in different colors, it would be much more intuitively 
understandable what the underlying partition structure is.

tl;dr: the forms below should each have distinct visual representation when 
displaying the column headers, and the column headers should always be shown.

CREATE TABLE usertest (
  userid text,
  email text,
  name text,
  PRIMARY KEY (userid)
) 

CREATE TABLE usertest2 (
  userid text,
  email text,
  name text,
  PRIMARY KEY (userid, email)
)

CREATE TABLE usertest3 (
  userid text,
  email text,
  name text,
  PRIMARY KEY ((userid, email))
)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6746) Reads have a slow ramp up in speed


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-6746:


Attachment: 6746-buffered-io-tweaks.png

[~xedin] Here's a benchmark of your buffered-io-tweaks patch:

!6746-buffered-io-tweaks.png!

It seemed to delay the ramp up, and shorten the duration, but it still did it. 
I did two trials of it to make sure.

I'll get you a mixed workload benchmark soon. 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6506) counters++ split counter context shards into separate cells

2014-03-23 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944567#comment-13944567
 ] 

Aleksey Yeschenko commented on CASSANDRA-6506:
--

Any objections to at least committing the first two cleanup commits now (the 
first one is there to mostly kill all the IDEA warnings, at last, but the 
second one - c503d6ae89651186b9ac7fc8026eab0ace137af - does deconfuse the API a 
bit) ?

 counters++ split counter context shards into separate cells
 ---

 Key: CASSANDRA-6506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6506
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 2.1 beta2


 This change is related to, but somewhat orthogonal to CASSANDRA-6504.
 Currently all the shard tuples for a given counter cell are packed, in sorted 
 order, in one binary blob. Thus reconciling N counter cells requires 
 allocating a new byte buffer capable of holding the union of the two 
 context's shards N-1 times.
 For writes, in post CASSANDRA-6504 world, it also means reading more data 
 than we have to (the complete context, when all we need is the local node's 
 global shard).
 Splitting the context into separate cells, one cell per shard, will help to 
 improve this. We did a similar thing with super columns for CASSANDRA-3237. 
 Incidentally, doing this split is now possible thanks to CASSANDRA-3237.
 Doing this would also simplify counter reconciliation logic. Getting rid of 
 old contexts altogether can be done trivially with upgradesstables.
 In fact, we should be able to put the logical clock into the cell's 
 timestamp, and use regular Cell-s and regular Cell reconcile() logic for the 
 shards, especially once we get rid of the local/remote shards some time in 
 the future (until then we still have to differentiate between 
 global/remote/local shards and their priority rules).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-6911) Netty dependency update broke stress

Ryan McGuire created CASSANDRA-6911:
---

 Summary: Netty dependency update broke stress
 Key: CASSANDRA-6911
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6911
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Ryan McGuire
Assignee: Benedict


I compiled stress fresh from cassandra-2.1 and running this command:

{code}
cassandra-stress write n=1900 -rate threads=50 -node bdplab
{code}

I get the following traceback:

{code}
Exception in thread Thread-49 java.lang.NoClassDefFoundError: 
org/jboss/netty/channel/ChannelFactory
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:941)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:889)
at com.datastax.driver.core.Cluster.init(Cluster.java:88)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:144)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:854)
at 
org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:74)
at 
org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:155)
at 
org.apache.cassandra.stress.settings.StressSettings.getSmartThriftClient(StressSettings.java:70)
at 
org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:275)
Caused by: java.lang.ClassNotFoundException: 
org.jboss.netty.channel.ChannelFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
{code}

It seems this was introduced with an updated netty jar in 
cbf304ebd0436a321753e81231545b705aa8dd23



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944580#comment-13944580
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Yes, it would not eliminate it completely just shorten the 
duration and speed up initial warmup, but this drop in operation is worrisome, 
can you check if that could be something JVM related or something on Cassandra 
side happening at the same time with drop in op rate?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6746) Reads have a slow ramp up in speed


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-6746:


Attachment: 6746.buffered_io_tweaks.logs.tar.gz

I'm using java 1.7.0_51. Using a default cassandra.yaml except for the 
disk_access_mode:standard setting. I don't see anything weird in the logs, but 
I've uploaded them if you want to check them out 
(6746.buffered_io_tweaks.logs.tar.gz).

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6575) By default, Cassandra should refuse to start if JNA can't be initialized properly


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944649#comment-13944649
 ] 

Ryan McGuire commented on CASSANDRA-6575:
-

This got broken.

I have deleted the JNA jar file from my lib directory, and the error message 
telling me that cassandra refuses to start is still working. However, if I use 
the boot_without_jna option it suggests, I get this traceback:

{code}
ERROR 23:28:22 Exception in thread Thread[MemtableFlushWriter:1,5,main]
java.lang.NoClassDefFoundError: com/sun/jna/Native
at org.apache.cassandra.io.util.Memory.asByteBuffers(Memory.java:305) 
~[main/:na]
at 
org.apache.cassandra.io.util.AbstractDataOutput.write(AbstractDataOutput.java:326)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.IndexSummary$IndexSummarySerializer.serialize(IndexSummary.java:221)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.SSTableReader.saveSummary(SSTableReader.java:709)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.SSTableReader.saveSummary(SSTableReader.java:696)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:356)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:331)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:326)
 ~[main/:na]
at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:363)
 ~[main/:na]
at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:321) 
~[main/:na]
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 ~[main/:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[main/:na]
at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
 ~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1029)
 ~[main/:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_51]
Caused by: java.lang.ClassNotFoundException: com.sun.jna.Native
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_51]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_51]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.7.0_51]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
~[na:1.7.0_51]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_51]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
~[na:1.7.0_51]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_51]
... 17 common frames omitted
{code}

This was working at the time this ticket was closed before, but it's now broken 
on cassandra-2.1 HEAD.

 By default, Cassandra should refuse to start if JNA can't be initialized 
 properly
 -

 Key: CASSANDRA-6575
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6575
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Tupshin Harper
Assignee: Clément Lardeur
Priority: Minor
  Labels: lhf
 Fix For: 2.1 beta1

 Attachments: trunk-6575-v2.patch, trunk-6575-v3.patch, 
 trunk-6575-v4.patch, trunk-6575.patch


 Failure to have JNA working properly is such a common undetected problem that 
 it would be far preferable to have Cassandra refuse to startup unless JNA is 
 initialized. In theory, this should be much less of a problem with Cassandra 
 2.1 due to CASSANDRA-5872, but even there, it might fail due to native lib 
 problems, or might otherwise be misconfigured. A yaml override, such as 
 boot_without_jna would allow the deliberate overriding of this policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (CASSANDRA-6575) By default, Cassandra should refuse to start if JNA can't be initialized properly


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire reopened CASSANDRA-6575:
-


 By default, Cassandra should refuse to start if JNA can't be initialized 
 properly
 -

 Key: CASSANDRA-6575
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6575
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Tupshin Harper
Assignee: Clément Lardeur
Priority: Minor
  Labels: lhf
 Fix For: 2.1 beta1

 Attachments: trunk-6575-v2.patch, trunk-6575-v3.patch, 
 trunk-6575-v4.patch, trunk-6575.patch


 Failure to have JNA working properly is such a common undetected problem that 
 it would be far preferable to have Cassandra refuse to startup unless JNA is 
 initialized. In theory, this should be much less of a problem with Cassandra 
 2.1 due to CASSANDRA-5872, but even there, it might fail due to native lib 
 problems, or might otherwise be misconfigured. A yaml override, such as 
 boot_without_jna would allow the deliberate overriding of this policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944654#comment-13944654
 ] 

Ryan McGuire commented on CASSANDRA-6746:
-

I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://localhost:8000/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.json]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944654#comment-13944654
 ] 

Ryan McGuire edited comment on CASSANDRA-6746 at 3/24/14 12:04 AM:
---

I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.json]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 


was (Author: enigmacurry):
I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://localhost:8000/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.json]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944654#comment-13944654
 ] 

Ryan McGuire edited comment on CASSANDRA-6746 at 3/24/14 12:05 AM:
---

I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.jsonoperation=mixed]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 


was (Author: enigmacurry):
I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.json]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944654#comment-13944654
 ] 

Ryan McGuire edited comment on CASSANDRA-6746 at 3/24/14 12:08 AM:
---

I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.jsonoperation=mixed]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.0 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 


was (Author: enigmacurry):
I ran a mixed read / write workload on a number of branches.

[You can see the results 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.mixed.jsonoperation=mixed]

That chart is a bit messy, so you need to click the colored squares to only see 
results for a few branches at a time. 

The branches tested:
 * [~xedin]'s buffered-io-tweaks patch on cassandra-2.1 HEAD
 * cassandra-2.1 HEAD
 * cassandra-2.0 HEAD with JNA
 * cassandra-2.1 HEAD without JNA

Similar to the buffered-io-tweaks run I did for solo-reads, it looks to improve 
things here as well. However, even in mixed workloads, simply disabling JNA is 
still working better. I cannot currently test cassandra-2.1 without JNA because 
of CASSANDRA-6575 which I have just now reopened.
 

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6908) Dynamic endpoint snitch destabilizes cluster under heavy load

[
https://issues.apache.org/jira/browse/CASSANDRA-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944672#comment-13944672
]

Bartłomiej Romański commented on CASSANDRA-6908:

We're using 2.0.5. I'm looking that code from 2.0.6 now and it looks like it's
still there (however I'm not very familiar with the code so it's possible I
understand something wrong).

Dynamic endpoint snitch destabilizes cluster under heavy load
-

Key: CASSANDRA-6908
URL: https://issues.apache.org/jira/browse/CASSANDRA-6908
Project: Cassandra
Issue Type: Improvement
Components: Config, Core
Reporter: Bartłomiej Romański

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944690#comment-13944690
 ] 

Pavel Yaskevich commented on CASSANDRA-6746:


[~enigmacurry] Thanks for the results, this looks promising although I one 
question for me remains why is there that deep for buffered-io patch, It might 
be related to the last compaction combining 4 sstables into one... Can you 
please do the following experiment - write the data, force a flush + major 
compaction, once all compactions complete run the buffered-io-tweaks patch to 
see if that deep in the middle of the run is actually caused by compaction 
replacing pre-heated file set with completely cold file?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944690#comment-13944690
 ] 

Pavel Yaskevich edited comment on CASSANDRA-6746 at 3/24/14 1:49 AM:
-

[~enigmacurry] Thanks for the results, this looks promising although, one 
question still remains - why is the dip for buffered-io patch happening. It 
might be related to the last compaction combining 4 sstables into one... Can 
you please do the following experiment - write the data, force a flush + major 
compaction, once all compactions complete run the buffered-io-tweaks patch to 
see if that deep in the middle of the run is actually caused by compaction 
replacing pre-heated file set with completely cold file?


was (Author: xedin):
[~enigmacurry] Thanks for the results, this looks promising although I one 
question for me remains why is there that deep for buffered-io patch, It might 
be related to the last compaction combining 4 sstables into one... Can you 
please do the following experiment - write the data, force a flush + major 
compaction, once all compactions complete run the buffered-io-tweaks patch to 
see if that deep in the middle of the run is actually caused by compaction 
replacing pre-heated file set with completely cold file?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.txt, buffered-io-tweaks.patch, 
 cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6746) Reads have a slow ramp up in speed


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-6746:


Attachment: 6746.buffered_io_tweaks.write-read-flush-compact.png

Write, then flush, then compact, then read - that seems to work well:

!6746.buffered_io_tweaks.write-read-flush-compact.png!

[data 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.write-flush-compact-read.jsonoperation=read]

I'll run a (write, flush, compact, mixed read/write) test next to make sure 
that looks good too.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6746) Reads have a slow ramp up in speed


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-6746:


Attachment: 6746.buffered_io_tweaks.write-flush-compact-mixed.png

write, flush, compact, mixed read/write:

!6746.buffered_io_tweaks.write-flush-compact-mixed.png!

[data 
here|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=stats.6746.buffered-io-tweaks.write-flush-compact-mixed.jsonmetric=op_rateoperation=mixedsmoothing=1xmin=0xmax=381.59ymin=0ymax=98910.9]


So, looks like it's eliminating any drop at the start. I don't have an 
explanation for the short/periodic drops here in mixed mode. I don't have a lot 
of experience with this new stress mode yet to know, except that 2.0 without 
JNA still faired better.

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6746) Reads have a slow ramp up in speed


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944760#comment-13944760
 ] 

Ryan McGuire edited comment on CASSANDRA-6746 at 3/24/14 5:28 AM:
--

[~benedict] can you confirm if my stress options look alright for mixed mode, 
do you have a better suggestion?


was (Author: enigmacurry):
@benedict can you confirm if my stress options look alright for mixed mode, do 
you have a better suggestion?

 Reads have a slow ramp up in speed
 --

 Key: CASSANDRA-6746
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ryan McGuire
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 2.1_vs_2.0_read.png, 6746-buffered-io-tweaks.png, 
 6746-patched.png, 6746.blockdev_setra.full.png, 
 6746.blockdev_setra.zoomed.png, 6746.buffered_io_tweaks.logs.tar.gz, 
 6746.buffered_io_tweaks.write-flush-compact-mixed.png, 
 6746.buffered_io_tweaks.write-read-flush-compact.png, 6746.txt, 
 buffered-io-tweaks.patch, cassandra-2.0-bdplab-trial-fincore.tar.bz2, 
 cassandra-2.1-bdplab-trial-fincore.tar.bz2


 On a physical four node cluister I am doing a big write and then a big read. 
 The read takes a long time to ramp up to respectable speeds.
 !2.1_vs_2.0_read.png!
 [See data 
 here|http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed