Re: repair performance

2017-03-17 Thread Roland Otta
did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
>:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland





Does SASI index support IN?

2017-03-17 Thread Yu, John
All,

I've been experimenting with Cassandra 3.10 now, with the hope that SASI has 
improved. To much disappointment, it seems it still doesn't support simple 
operation like IN. Have others tried the same? Also with a small test data set 
(160K records), the performance is also not better than just doing without the 
index (using allow filtering). Very confused what the index really do?

Thanks,
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.


Re: repair performance

2017-03-17 Thread benjamin roth
The fork from thelastpickle is. I'd recommend to give it a try over pure
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta :

> forgot to mention the version we are using:
>
> we are using 3.0.7 - so i guess we should have incremental repairs by
> default.
> it also prints out incremental:true when starting a repair
> INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 -
> Starting repair command #7, repairing keyspace xxx with repair options
> (parallelism: parallel, primary range: false, incremental: true, job
> threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of
> ranges: 1758)
>
> 3.0.7 is also the reason why we are not using reaper ... as far as i could
> figure out it's not compatible with 3.0+
>
>
>
> On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
>
> It depends a lot ...
>
> - Repairs can be very slow, yes! (And unreliable, due to timeouts,
> outages, whatever)
> - You can use incremental repairs to speed things up for regular repairs
> - You can use "reaper" to schedule repairs and run them sliced, automated,
> failsafe
>
> The time repairs actually may vary a lot depending on how much data has to
> be streamed or how inconsistent your cluster is.
>
> 50mbit/s is really a bit low! The actual performance depends on so many
> factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
> nodes" of the cluster.
> This is a quite individual problem you have to track down individually.
>
> 2017-03-17 22:07 GMT+01:00 Roland Otta :
>
> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland
>
>
>


Re: repair performance

2017-03-17 Thread Roland Otta
... maybe i should just try increasing the job threads with --job-threads

shame on me

On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread Roland Otta
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread benjamin roth
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages,
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated,
failsafe

The time repairs actually may vary a lot depending on how much data has to
be streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many
factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
nodes" of the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta :

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland


repair performance

2017-03-17 Thread Roland Otta
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: Random slow read times in Cassandra

2017-03-17 Thread daemeon reiydelle
check for level 2 (stop the world) garbage collections.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101 <(219)%20798-7101>:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508 <(224)%20966-2508>:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948 <(217)%20700-4948>:9009:66
>
> Get Person time: 2 2042756682 <(204)%20275-6682>:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396 <(219)%20421-1396>:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689 <(226)%20484-9689>:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Re: Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
A wrinkle further confounds the issue: running a repair on the node which
was servicing the queries has cleared things up and all the queries now
work.

That doesn't make a whole lot of sense to me - my assumption was that a
repair shouldn't have fixed it.

On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnot 
wrote:

> Cassandra 3.9, 4 nodes, rf=3
>
> Hi folks, we're see 0 results returned from queries that (a) should return
> results, and (b) will return results with minor tweaks.
>
> I've attached the sanitized trace outputs for the following 3 queries (pk1
> and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
> non-key column):
>
> Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
> ALLOW FILTERING;
> Q1 works - it returns a list of records, one of which has
> val1='abcdefghijklmn'.
>
> Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
> 1001 ALLOW FILTERING;
> Q2 does not work - 0 results returned. Only difference to Q1 is one
> additional character provided in LIKE comparison.
>
> Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
> 1001 ALLOW FILTERING;
> Q3 does not work - 0 results returned.
>
> As I've written above, the data set *does* include a record with
> val1='abcdefghijklmn'.
>
> Confounding the issue is that this behavior is inconsistent.  For
> different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2
> do not. Now, that particular behavior I could explain with index/like
> problems, but it is Q3 that sometimes does not work and that's a simply
> equality comparison (although still using the index).
>
> Further confounding the issue is that if my testers run these same queries
> with the same parameters tomorrow, they're likely to work correctly.
>
> Only thing I've been able to glean from tracing execution is that the
> queries that work follow "Executing read..." with "Executing single
> partition query on t1" and so forth,  whereas the queries that don't work
> simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
> with no actual work seemingly done. But that's not helping me narrow this
> down much.
>
> Thanks for your time - appreciate any help.
>


Re: Random slow read times in Cassandra

2017-03-17 Thread Jonathan Haddad
Probably Jvm pauses. Check your logs for long GC times.
On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948:9009:66
>
> Get Person time: 2 2042756682:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Random slow read times in Cassandra

2017-03-17 Thread Chuck Reynolds
I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has consistently 
random high read times.  In general most reads are under 10 milliseconds but 
with in the 30 request there is usually a read time that is a couple of seconds.

Instance type: r4.8xlarge
EBS GP2 volumes, 3tb with 9000 IOPS
30 Gig Heap

Data per node is about 170 gigs

The keyspace is an id & a blob.  When I check the data the slow reads don’t 
seem to have anything to do with size of the blobs

This system has repairs run once a weeks because it takes a lot of updates.

The client makes a call and does 30 request serially to Cassandra and the 
response times look like this in milliseconds.

What could make these so slow and what can I do to diagnosis this?


Responses

Get Person time: 3 319746229:9009:66
Get Person time: 7 1830093695:9009:66
Get Person time: 4 30072253:9009:66
Get Person time: 4 2303790089:9009:66
Get Person time: 2 156792066:9009:66
Get Person time: 8 491230624:9009:66
Get Person time: 7 284904599:9009:66
Get Person time: 4 600370489:9009:66
Get Person time: 2 281007386:9009:66
Get Person time: 4 971178094:9009:66
Get Person time: 1 1322259885:9009:66
Get Person time: 2 1937958542:9009:66
Get Person time: 9 286536648:9009:66
Get Person time: 9 1835633470:9009:66
Get Person time: 2 300867513:9009:66
Get Person time: 3 178975468:9009:66
Get Person time: 2900 293043081:9009:66
Get Person time: 8 214913830:9009:66
Get Person time: 2 1956710764:9009:66
Get Person time: 4 237673776:9009:66
Get Person time: 17 68942206:9009:66
Get Person time: 1800 20072145:9009:66
Get Person time: 2 304698506:9009:66
Get Person time: 2 308177320:9009:66
Get Person time: 2 998436038:9009:66
Get Person time: 10 1036890112:9009:66
Get Person time: 1 1629649548:9009:66
Get Person time: 6 1595339706:9009:66
Get Person time: 4 1079637599:9009:66
Get Person time: 3 556342855:9009:66


Get Person time: 5 1856382256:9009:66
Get Person time: 3 1891737174:9009:66
Get Person time: 2 1179373651:9009:66
Get Person time: 2 1482602756:9009:66
Get Person time: 3 1236458510:9009:66
Get Person time: 11 1003159823:9009:66
Get Person time: 2 1264952556:9009:66
Get Person time: 2 1662234295:9009:66
Get Person time: 1 246108569:9009:66
Get Person time: 5 1709881651:9009:66
Get Person time: 3213 11878078:9009:66
Get Person time: 2 112866483:9009:66
Get Person time: 2 201870153:9009:66
Get Person time: 6 227696684:9009:66
Get Person time: 2 1946780190:9009:66
Get Person time: 2 2197987101:9009:66
Get Person time: 18 1838959725:9009:66
Get Person time: 3 1782937802:9009:66
Get Person time: 3 1692530939:9009:66
Get Person time: 9 1765654196:9009:66
Get Person time: 2 1597757121:9009:66
Get Person time: 2 1853127153:9009:66
Get Person time: 3 1533599253:9009:66
Get Person time: 6 1693244112:9009:66
Get Person time: 6 82047537:9009:66
Get Person time: 2 96221961:9009:66
Get Person time: 4 98202209:9009:66
Get Person time: 9 12952388:9009:66
Get Person time: 2 300118652:9009:66
Get Person time: 10 78801084:9009:66


Get Person time: 13 1856424913:9009:66
Get Person time: 2 255814186:9009:66
Get Person time: 2 1183397424:9009:66
Get Person time: 5 1828603730:9009:66
Get Person time: 9 132965919:9009:66
Get Person time: 4 1616190071:9009:66
Get Person time: 2 15929337:9009:66
Get Person time: 10 297005427:9009:66
Get Person time: 2 1306460047:9009:66
Get Person time: 5 620139216:9009:66
Get Person time: 2 1364349058:9009:66
Get Person time: 3 629543403:9009:66
Get Person time: 5 1299827034:9009:66
Get Person time: 4 1593205912:9009:66
Get Person time: 2 1755460077:9009:66
Get Person time: 2 1906388666:9009:66
Get Person time: 1 1838653952:9009:66
Get Person time: 2 2249662508:9009:66
Get Person time: 3 1931708432:9009:66
Get Person time: 2 2177004948:9009:66
Get Person time: 2 2042756682:9009:66
Get Person time: 5 41764865:9009:66
Get Person time: 4023 1733384704:9009:66
Get Person time: 1 1614842189:9009:66
Get Person time: 2 2194211396:9009:66
Get Person time: 3 1711330834:9009:66
Get Person time: 2 2264849689:9009:66
Get Person time: 3 1819027970:9009:66
Get Person time: 2 1978614851:9009:66
Get Person time: 1 1863483129:9009:66



Re: Purge data from repair_history table?

2017-03-17 Thread Gábor Auth
Oh, thanks! :)

On Fri, 17 Mar 2017, 14:22 Paulo Motta,  wrote:

> It's safe to truncate this table since it's just used to inspect repairs
> for troubleshooting. You may also set a default TTL to avoid it from
> growing unbounded (this is going to be done by default on CASSANDRA-12701).
>
> 2017-03-17 8:36 GMT-03:00 Gábor Auth :
>
> Hi,
>
> I've discovered a relative huge size of data in the system_distributed
> keyspace's repair_history table:
>Table: repair_history
>Space used (live): 389409804
>Space used (total): 389409804
>
> What is the purpose of this data? There is any safe method to purge? :)
>
> Bye,
> Gábor Auth
>
>
>


Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
Cassandra 3.9, 4 nodes, rf=3

Hi folks, we're see 0 results returned from queries that (a) should return
results, and (b) will return results with minor tweaks.

I've attached the sanitized trace outputs for the following 3 queries (pk1
and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
non-key column):

Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
ALLOW FILTERING;
Q1 works - it returns a list of records, one of which has
val1='abcdefghijklmn'.

Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
1001 ALLOW FILTERING;
Q2 does not work - 0 results returned. Only difference to Q1 is one
additional character provided in LIKE comparison.

Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
1001 ALLOW FILTERING;
Q3 does not work - 0 results returned.

As I've written above, the data set *does* include a record with
val1='abcdefghijklmn'.

Confounding the issue is that this behavior is inconsistent.  For different
values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not.
Now, that particular behavior I could explain with index/like problems, but
it is Q3 that sometimes does not work and that's a simply equality
comparison (although still using the index).

Further confounding the issue is that if my testers run these same queries
with the same parameters tomorrow, they're likely to work correctly.

Only thing I've been able to glean from tracing execution is that the
queries that work follow "Executing read..." with "Executing single
partition query on t1" and so forth,  whereas the queries that don't work
simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
with no actual work seemingly done. But that's not helping me narrow this
down much.

Thanks for your time - appreciate any help.
Results found query (which include record where val='abcdefghijklmn'):

 Parsing SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >= 
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001 
ALLOW FILTERING; [Native-Transport-Requests-1]

  Preparing 
statement [Native-Transport-Requests-1]
  Index 
mean cardinalities are idx_my_idx:-9223372036854775808. Scanning with 
idx_my_idx. [Native-Transport-Requests-1]

Computing ranges to 
query [Native-Transport-Requests-1]
   Submitting range 
requests on 1 ranges with a concurrency of 1 (-1.08086395E16 rows per range 
expected) [Native-Transport-Requests-1]

Submitted 1 concurrent range 
requests [Native-Transport-Requests-1]

 Executing read on keyspace.t1 
using index idx_my_idx [ReadStage-2]

   Executing 
single-partition query on t1 [ReadStage-2]

 Acquiring 
sstable references [ReadStage-2]

   Key cache 
hit for sstable 2223 [ReadStage-2]

  Skipped 34/35 non-slice-intersecting sstables, included 1 
due to tombstones [ReadStage-2]

   Key cache 
hit for sstable 2221 [ReadStage-2]

Merged data from 
memtables and 2 sstables [ReadStage-2]

Read 1 live and 
0 tombstone cells [ReadStage-2]


   

Re: Purge data from repair_history table?

2017-03-17 Thread Paulo Motta
It's safe to truncate this table since it's just used to inspect repairs
for troubleshooting. You may also set a default TTL to avoid it from
growing unbounded (this is going to be done by default on CASSANDRA-12701).

2017-03-17 8:36 GMT-03:00 Gábor Auth :

> Hi,
>
> I've discovered a relative huge size of data in the system_distributed
> keyspace's repair_history table:
>Table: repair_history
>Space used (live): 389409804
>Space used (total): 389409804
>
> What is the purpose of this data? There is any safe method to purge? :)
>
> Bye,
> Gábor Auth
>
>


Re: Slow repair

2017-03-17 Thread Gábor Auth
Hi,

On Wed, Mar 15, 2017 at 11:35 AM Ben Slater 
wrote:

> When you say you’re running repair to “rebalance” do you mean to populate
> the new DC? If so, the normal/correct procedure is to use nodetool rebuild
> rather than repair.
>

Oh, thank you! :)

Bye,
Gábor Auth

>


Purge data from repair_history table?

2017-03-17 Thread Gábor Auth
Hi,

I've discovered a relative huge size of data in the system_distributed
keyspace's repair_history table:
   Table: repair_history
   Space used (live): 389409804
   Space used (total): 389409804

What is the purpose of this data? There is any safe method to purge? :)

Bye,
Gábor Auth


Re: Issue with Cassandra consistency in results

2017-03-17 Thread daemeon reiydelle
The prep is needed. If I recall correctly it must remain in cache for the
query to complete. I don't have the docs to dig out the yaml parm to adjust
query cache. I had run into the problem stress testing a smallish cluster
with many queries at once.

Do you have a sense of how many distinct queries are hitting the cluster at
peak?

If many clients, how do you balance the connection load or do you always
hit the same node?


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 16, 2017 3:25 PM, "srinivasarao daruna" 
wrote:

> Hi reiydelle,
>
> I cannot confirm the range as the volume of data is huge and the query
> frequency is also high.
> If the cache is the cause of issue, can we increase cache size or is there
> solution to avoid dropped prep statements.?
>
>
>
>
>
>
> Thank You,
> Regards,
> Srini
>
> On Thu, Mar 16, 2017 at 2:13 PM, daemeon reiydelle 
> wrote:
>
>> The discard due to oom is causing the zero returned. I would guess a
>> cache miss problem of some sort, but not sure. Are you using row, index,
>> etc. caches? Are you seeing the failed prep statement on random nodes (duh,
>> nodes that have the relevant data ranges)?
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Thu, Mar 16, 2017 at 10:56 AM, Ryan Svihla  wrote:
>>
>>> Depends actually, restore just restores what's there, so if only one
>>> node had a copy of the data then only one node had a copy of the data
>>> meaning quorum will still be wrong sometimes.
>>>
>>> On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
>>> arvydas.jonuso...@gmail.com> wrote:
>>>
 If the data was written at ONE, consistency is not guaranteed. ..but
 considering you just restored the cluster, there's a good chance something
 else is off.

 On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna <
 sree.srin...@gmail.com> wrote:

> Want to make read and write QUORUM as well.
>
>
> On Mar 16, 2017 1:09 PM, "Ryan Svihla"  wrote:
>
> Replication factor is 3, and write consistency is ONE and read
> consistency is QUORUM.
>
> That combination is not gonna work well:
>
> *Write succeeds to NODE A but fails on node B,C*
>
> *Read goes to NODE B, C*
>
> If you can tolerate some temporary inaccuracy you can use QUORUM but
> may still have the situation where
>
> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
> Read succeeds on node B and C at timestamp 1
>
> If you need fully race condition free counts I'm afraid you need to
> use SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>
> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
> sree.srin...@gmail.com> wrote:
>
> Replication strategy is SimpleReplicationStrategy.
>
> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>
> I was worried that CL=ALL have more read latency and read failures.
> But won't rule out trying it.
>
> Should I switch select count (*) to select partition_key column? Would
> that be of any help.?
>
>
> Thank you
> Regards
> Srini
>
> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
> arvydas.jonuso...@gmail.com> wrote:
>
> What are your replication strategy and snitch settings?
>
> Have you tried doing a read at CL=ALL? If it's an actual inconsistency
> issue (missing data), this should cause the correct results to be 
> returned.
> You'll need to run a repair to fix the inconsistencies.
>
> If all the data is actually there, you might have one or several nodes
> that aren't identifying the correct replicas.
>
> Arvydas
>
>
>
> On Thu, Mar 16, 2017 at 5:31 PM, srinivasarao daruna <
> sree.srin...@gmail.com> wrote:
>
> Hi Team,
>
> We are struggling with a problem related to cassandra counts, after
> backup and restore of the cluster. Aaron Morton has suggested to send this
> to user list, so some one of the list will be able to help me.
>
> We are have a rest api to talk to cassandra and one of our query which
> fetches count is creating problems for us.
>
> We have done backup and restore and copied all the data to new
> cluster. We have done nodetool refresh on the tables, and did the nodetool
> repair as well.
>
> However, one of our key API call is returning inconsistent results.
> The result count is 0 in the first call and giving the actual values for
> later calls. The query frequency is bit high and failure rate has also
> raised considerably.
>
> 1) The count query has partition keys in it. Didnt see any read
> timeout or any errors from api logs.
>
> 2) This