Chris,
I agree that reading 250k row is a bit excessive and that breaking up the
partition would help reduce the query time. That part is well understood. The
part that we can't figure out is why read time did not change when we switched
from a slow Network Attached Storage (AWS EBS) to local
It is possible this is CPU bound. In 2.1 we have optimised the comparison
of clustering columns (CASSANDRA-5417
https://issues.apache.org/jira/browse/CASSANDRA-5417), but in 2.0 it is
quite expensive. So for a large row with several million comparisons to
perform (to merge, filter, etc.) it could
Benedict,
That makes perfect sense. Even though the node has multiple cores, I do see
that only one core is pegged at 100%.
Interestingly, after I switched to 2.1, cqlsh trace now shows that the same
query takes only 600ms. However, cqlsh still waits for almost 20-30 seconds
before it starts
When you say you moved from EBS to SSD, do you mean the EBS HDD drives to
EBS SSD drives? Or instance SSD drives? The m3.large only comes with 32GB
of instance based SSD storage. If you're using EBS SSD drives then network
will still be the slowest thing so switching won't likely make much of a
On Tue, Sep 16, 2014 at 10:00 PM, Mohammed Guller moham...@glassbeam.com
wrote:
The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5
seconds out of that was taken up by the “merge memtable and sstables” step.
The remaining 5 seconds are from “read live and tombstoned
Thank you all for your responses.
Alex –
Instance (ephemeral) SSD
Ben –
the query reads data from just one partition. If disk i/o is the bottleneck,
then in theory, if reading from EBS takes 10 seconds, then it should take lot
less when reading the same amount of data from local SSD. My
Read 193311 live and 0 tombstoned cells
is your killer. returning 250k rows is a bit excessive, you should really page
this in smaller chunks, what client are you using to access the data? This
partition (a, b, c, d, e, f) may be too large as well (can check partition max
size from output
On Tue, Sep 16, 2014 at 5:35 PM, Mohammed Guller moham...@glassbeam.com
wrote:
Does anyone have insight as to why we don't see any performance impact on
the reads going from EBS to SSD?
What does it say when you enable tracing on this CQL query?
10 seconds is a really long time to access
To expand on what Robert said, Cassandra is a log-structured database:
- writes are append operations, so both correctly configured disk volumes and
SSD are fast at that
- reads could be helped by SSD if they're not in cache (ie. on disk)
- but compaction is definitely helped by SSD with large
Mohammed, to add to previous answers, EBS is network attached, with SSD or
without it , you access your disk via network constrained by network
bandwidth and latency, if you really need to improve IO performance try
switching to ephemeral storage (also called instance storage) which is
If you cached your tables or the database you may not see any difference at all.
Regards,
-Tony
On Tuesday, September 16, 2014 6:36 PM, Mohammed Guller
moham...@glassbeam.com wrote:
Hi -
We are running Cassandra 2.0.5 on AWS on m3.large instances. These instances
were using EBS for
EBS vs local SSD in terms of latency you are using ms as your unit of
measurement.
If your query runs for 10s you will not notice anything. What is a few less
ms for the life of a 10 second query.
To reiterate what Rob said. The query is probably slow because of your use
case / data model, not
Rob,
The 10 seconds latency that I gave earlier is from CQL tracing. Almost 5
seconds out of that was taken up by the “merge memtable and sstables” step. The
remaining 5 seconds are from “read live and tombstoned cells.”
I too first thought that maybe disk is not the bottleneck and Cassandra is
13 matches
Mail list logo