Purge data from repair_history table?

2017-03-17 Thread Gábor Auth
Hi,

I've discovered a relative huge size of data in the system_distributed
keyspace's repair_history table:
   Table: repair_history
   Space used (live): 389409804
   Space used (total): 389409804

What is the purpose of this data? There is any safe method to purge? :)

Bye,
Gábor Auth


Re: Slow repair

2017-03-17 Thread Gábor Auth
Hi,

On Wed, Mar 15, 2017 at 11:35 AM Ben Slater 
wrote:

> When you say you’re running repair to “rebalance” do you mean to populate
> the new DC? If so, the normal/correct procedure is to use nodetool rebuild
> rather than repair.
>

Oh, thank you! :)

Bye,
Gábor Auth

>


Re: Purge data from repair_history table?

2017-03-17 Thread Paulo Motta
It's safe to truncate this table since it's just used to inspect repairs
for troubleshooting. You may also set a default TTL to avoid it from
growing unbounded (this is going to be done by default on CASSANDRA-12701).

2017-03-17 8:36 GMT-03:00 Gábor Auth :

> Hi,
>
> I've discovered a relative huge size of data in the system_distributed
> keyspace's repair_history table:
>Table: repair_history
>Space used (live): 389409804
>Space used (total): 389409804
>
> What is the purpose of this data? There is any safe method to purge? :)
>
> Bye,
> Gábor Auth
>
>


Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
Cassandra 3.9, 4 nodes, rf=3

Hi folks, we're see 0 results returned from queries that (a) should return
results, and (b) will return results with minor tweaks.

I've attached the sanitized trace outputs for the following 3 queries (pk1
and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
non-key column):

Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
ALLOW FILTERING;
Q1 works - it returns a list of records, one of which has
val1='abcdefghijklmn'.

Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
1001 ALLOW FILTERING;
Q2 does not work - 0 results returned. Only difference to Q1 is one
additional character provided in LIKE comparison.

Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
1001 ALLOW FILTERING;
Q3 does not work - 0 results returned.

As I've written above, the data set *does* include a record with
val1='abcdefghijklmn'.

Confounding the issue is that this behavior is inconsistent.  For different
values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not.
Now, that particular behavior I could explain with index/like problems, but
it is Q3 that sometimes does not work and that's a simply equality
comparison (although still using the index).

Further confounding the issue is that if my testers run these same queries
with the same parameters tomorrow, they're likely to work correctly.

Only thing I've been able to glean from tracing execution is that the
queries that work follow "Executing read..." with "Executing single
partition query on t1" and so forth,  whereas the queries that don't work
simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
with no actual work seemingly done. But that's not helping me narrow this
down much.

Thanks for your time - appreciate any help.
Results found query (which include record where val='abcdefghijklmn'):

 Parsing SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >= 
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001 
ALLOW FILTERING; [Native-Transport-Requests-1]

  Preparing 
statement [Native-Transport-Requests-1]
  Index 
mean cardinalities are idx_my_idx:-9223372036854775808. Scanning with 
idx_my_idx. [Native-Transport-Requests-1]

Computing ranges to 
query [Native-Transport-Requests-1]
   Submitting range 
requests on 1 ranges with a concurrency of 1 (-1.08086395E16 rows per range 
expected) [Native-Transport-Requests-1]

Submitted 1 concurrent range 
requests [Native-Transport-Requests-1]

 Executing read on keyspace.t1 
using index idx_my_idx [ReadStage-2]

   Executing 
single-partition query on t1 [ReadStage-2]

 Acquiring 
sstable references [ReadStage-2]

   Key cache 
hit for sstable 2223 [ReadStage-2]

  Skipped 34/35 non-slice-intersecting sstables, included 1 
due to tombstones [ReadStage-2]

   Key cache 
hit for sstable 2221 [ReadStage-2]

Merged data from 
memtables and 2 sstables [ReadStage-2]

Read 1 live and 
0 tombstone cells [ReadStage-2]


   Re

Re: Purge data from repair_history table?

2017-03-17 Thread Gábor Auth
Oh, thanks! :)

On Fri, 17 Mar 2017, 14:22 Paulo Motta,  wrote:

> It's safe to truncate this table since it's just used to inspect repairs
> for troubleshooting. You may also set a default TTL to avoid it from
> growing unbounded (this is going to be done by default on CASSANDRA-12701).
>
> 2017-03-17 8:36 GMT-03:00 Gábor Auth :
>
> Hi,
>
> I've discovered a relative huge size of data in the system_distributed
> keyspace's repair_history table:
>Table: repair_history
>Space used (live): 389409804
>Space used (total): 389409804
>
> What is the purpose of this data? There is any safe method to purge? :)
>
> Bye,
> Gábor Auth
>
>
>


Random slow read times in Cassandra

2017-03-17 Thread Chuck Reynolds
I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has consistently 
random high read times.  In general most reads are under 10 milliseconds but 
with in the 30 request there is usually a read time that is a couple of seconds.

Instance type: r4.8xlarge
EBS GP2 volumes, 3tb with 9000 IOPS
30 Gig Heap

Data per node is about 170 gigs

The keyspace is an id & a blob.  When I check the data the slow reads don’t 
seem to have anything to do with size of the blobs

This system has repairs run once a weeks because it takes a lot of updates.

The client makes a call and does 30 request serially to Cassandra and the 
response times look like this in milliseconds.

What could make these so slow and what can I do to diagnosis this?


Responses

Get Person time: 3 319746229:9009:66
Get Person time: 7 1830093695:9009:66
Get Person time: 4 30072253:9009:66
Get Person time: 4 2303790089:9009:66
Get Person time: 2 156792066:9009:66
Get Person time: 8 491230624:9009:66
Get Person time: 7 284904599:9009:66
Get Person time: 4 600370489:9009:66
Get Person time: 2 281007386:9009:66
Get Person time: 4 971178094:9009:66
Get Person time: 1 1322259885:9009:66
Get Person time: 2 1937958542:9009:66
Get Person time: 9 286536648:9009:66
Get Person time: 9 1835633470:9009:66
Get Person time: 2 300867513:9009:66
Get Person time: 3 178975468:9009:66
Get Person time: 2900 293043081:9009:66
Get Person time: 8 214913830:9009:66
Get Person time: 2 1956710764:9009:66
Get Person time: 4 237673776:9009:66
Get Person time: 17 68942206:9009:66
Get Person time: 1800 20072145:9009:66
Get Person time: 2 304698506:9009:66
Get Person time: 2 308177320:9009:66
Get Person time: 2 998436038:9009:66
Get Person time: 10 1036890112:9009:66
Get Person time: 1 1629649548:9009:66
Get Person time: 6 1595339706:9009:66
Get Person time: 4 1079637599:9009:66
Get Person time: 3 556342855:9009:66


Get Person time: 5 1856382256:9009:66
Get Person time: 3 1891737174:9009:66
Get Person time: 2 1179373651:9009:66
Get Person time: 2 1482602756:9009:66
Get Person time: 3 1236458510:9009:66
Get Person time: 11 1003159823:9009:66
Get Person time: 2 1264952556:9009:66
Get Person time: 2 1662234295:9009:66
Get Person time: 1 246108569:9009:66
Get Person time: 5 1709881651:9009:66
Get Person time: 3213 11878078:9009:66
Get Person time: 2 112866483:9009:66
Get Person time: 2 201870153:9009:66
Get Person time: 6 227696684:9009:66
Get Person time: 2 1946780190:9009:66
Get Person time: 2 2197987101:9009:66
Get Person time: 18 1838959725:9009:66
Get Person time: 3 1782937802:9009:66
Get Person time: 3 1692530939:9009:66
Get Person time: 9 1765654196:9009:66
Get Person time: 2 1597757121:9009:66
Get Person time: 2 1853127153:9009:66
Get Person time: 3 1533599253:9009:66
Get Person time: 6 1693244112:9009:66
Get Person time: 6 82047537:9009:66
Get Person time: 2 96221961:9009:66
Get Person time: 4 98202209:9009:66
Get Person time: 9 12952388:9009:66
Get Person time: 2 300118652:9009:66
Get Person time: 10 78801084:9009:66


Get Person time: 13 1856424913:9009:66
Get Person time: 2 255814186:9009:66
Get Person time: 2 1183397424:9009:66
Get Person time: 5 1828603730:9009:66
Get Person time: 9 132965919:9009:66
Get Person time: 4 1616190071:9009:66
Get Person time: 2 15929337:9009:66
Get Person time: 10 297005427:9009:66
Get Person time: 2 1306460047:9009:66
Get Person time: 5 620139216:9009:66
Get Person time: 2 1364349058:9009:66
Get Person time: 3 629543403:9009:66
Get Person time: 5 1299827034:9009:66
Get Person time: 4 1593205912:9009:66
Get Person time: 2 1755460077:9009:66
Get Person time: 2 1906388666:9009:66
Get Person time: 1 1838653952:9009:66
Get Person time: 2 2249662508:9009:66
Get Person time: 3 1931708432:9009:66
Get Person time: 2 2177004948:9009:66
Get Person time: 2 2042756682:9009:66
Get Person time: 5 41764865:9009:66
Get Person time: 4023 1733384704:9009:66
Get Person time: 1 1614842189:9009:66
Get Person time: 2 2194211396:9009:66
Get Person time: 3 1711330834:9009:66
Get Person time: 2 2264849689:9009:66
Get Person time: 3 1819027970:9009:66
Get Person time: 2 1978614851:9009:66
Get Person time: 1 1863483129:9009:66



Re: Random slow read times in Cassandra

2017-03-17 Thread Jonathan Haddad
Probably Jvm pauses. Check your logs for long GC times.
On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948:9009:66
>
> Get Person time: 2 2042756682:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Re: Very odd & inconsistent results from SASI query

2017-03-17 Thread Voytek Jarnot
A wrinkle further confounds the issue: running a repair on the node which
was servicing the queries has cleared things up and all the queries now
work.

That doesn't make a whole lot of sense to me - my assumption was that a
repair shouldn't have fixed it.

On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnot 
wrote:

> Cassandra 3.9, 4 nodes, rf=3
>
> Hi folks, we're see 0 results returned from queries that (a) should return
> results, and (b) will return results with minor tweaks.
>
> I've attached the sanitized trace outputs for the following 3 queries (pk1
> and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
> non-key column):
>
> Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
> ALLOW FILTERING;
> Q1 works - it returns a list of records, one of which has
> val1='abcdefghijklmn'.
>
> Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
> 1001 ALLOW FILTERING;
> Q2 does not work - 0 results returned. Only difference to Q1 is one
> additional character provided in LIKE comparison.
>
> Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
> 1001 ALLOW FILTERING;
> Q3 does not work - 0 results returned.
>
> As I've written above, the data set *does* include a record with
> val1='abcdefghijklmn'.
>
> Confounding the issue is that this behavior is inconsistent.  For
> different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2
> do not. Now, that particular behavior I could explain with index/like
> problems, but it is Q3 that sometimes does not work and that's a simply
> equality comparison (although still using the index).
>
> Further confounding the issue is that if my testers run these same queries
> with the same parameters tomorrow, they're likely to work correctly.
>
> Only thing I've been able to glean from tracing execution is that the
> queries that work follow "Executing read..." with "Executing single
> partition query on t1" and so forth,  whereas the queries that don't work
> simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
> with no actual work seemingly done. But that's not helping me narrow this
> down much.
>
> Thanks for your time - appreciate any help.
>


Re: Random slow read times in Cassandra

2017-03-17 Thread daemeon reiydelle
check for level 2 (stop the world) garbage collections.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101 <(219)%20798-7101>:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508 <(224)%20966-2508>:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948 <(217)%20700-4948>:9009:66
>
> Get Person time: 2 2042756682 <(204)%20275-6682>:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396 <(219)%20421-1396>:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689 <(226)%20484-9689>:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


repair performance

2017-03-17 Thread Roland Otta
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland

Re: repair performance

2017-03-17 Thread benjamin roth
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages,
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated,
failsafe

The time repairs actually may vary a lot depending on how much data has to
be streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many
factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
nodes" of the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta :

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland


Re: repair performance

2017-03-17 Thread Roland Otta
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
mailto:roland.o...@willhaben.at>>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread Roland Otta
... maybe i should just try increasing the job threads with --job-threads

shame on me

On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
mailto:roland.o...@willhaben.at>>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland



Re: repair performance

2017-03-17 Thread benjamin roth
The fork from thelastpickle is. I'd recommend to give it a try over pure
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta :

> forgot to mention the version we are using:
>
> we are using 3.0.7 - so i guess we should have incremental repairs by
> default.
> it also prints out incremental:true when starting a repair
> INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 -
> Starting repair command #7, repairing keyspace xxx with repair options
> (parallelism: parallel, primary range: false, incremental: true, job
> threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of
> ranges: 1758)
>
> 3.0.7 is also the reason why we are not using reaper ... as far as i could
> figure out it's not compatible with 3.0+
>
>
>
> On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
>
> It depends a lot ...
>
> - Repairs can be very slow, yes! (And unreliable, due to timeouts,
> outages, whatever)
> - You can use incremental repairs to speed things up for regular repairs
> - You can use "reaper" to schedule repairs and run them sliced, automated,
> failsafe
>
> The time repairs actually may vary a lot depending on how much data has to
> be streamed or how inconsistent your cluster is.
>
> 50mbit/s is really a bit low! The actual performance depends on so many
> factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old
> nodes" of the cluster.
> This is a quite individual problem you have to track down individually.
>
> 2017-03-17 22:07 GMT+01:00 Roland Otta :
>
> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland
>
>
>


Does SASI index support IN?

2017-03-17 Thread Yu, John
All,

I've been experimenting with Cassandra 3.10 now, with the hope that SASI has 
improved. To much disappointment, it seems it still doesn't support simple 
operation like IN. Have others tried the same? Also with a small test data set 
(160K records), the performance is also not better than just doing without the 
index (using allow filtering). Very confused what the index really do?

Thanks,
John

NOTICE OF CONFIDENTIALITY:
This message may contain information that is considered confidential and which 
may be prohibited from disclosure under applicable law or by contractual 
agreement. The information is intended solely for the use of the individual or 
entity named above. If you are not the intended recipient, you are hereby 
notified that any disclosure, copying, distribution or use of the information 
contained in or attached to this message is strictly prohibited. If you have 
received this email transmission in error, please notify the sender by replying 
to this email and then delete it from your system.


Re: repair performance

2017-03-17 Thread Roland Otta
did not recognize that so far.

thank you for the hint. i will definitely give it a try

On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote:
The fork from thelastpickle is. I'd recommend to give it a try over pure 
nodetool.

2017-03-17 22:30 GMT+01:00 Roland Otta 
mailto:roland.o...@willhaben.at>>:
forgot to mention the version we are using:

we are using 3.0.7 - so i guess we should have incremental repairs by default.
it also prints out incremental:true when starting a repair
INFO  [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting 
repair command #7, repairing keyspace xxx with repair options (parallelism: 
parallel, primary range: false, incremental: true, job threads: 1, 
ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758)

3.0.7 is also the reason why we are not using reaper ... as far as i could 
figure out it's not compatible with 3.0+



On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote:
It depends a lot ...

- Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, 
whatever)
- You can use incremental repairs to speed things up for regular repairs
- You can use "reaper" to schedule repairs and run them sliced, automated, 
failsafe

The time repairs actually may vary a lot depending on how much data has to be 
streamed or how inconsistent your cluster is.

50mbit/s is really a bit low! The actual performance depends on so many factors 
like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of 
the cluster.
This is a quite individual problem you have to track down individually.

2017-03-17 22:07 GMT+01:00 Roland Otta 
mailto:roland.o...@willhaben.at>>:
hello,

we are quite inexperienced with cassandra at the moment and are playing
around with a new cluster we built up for getting familiar with
cassandra and its possibilites.

while getting familiar with that topic we recognized that repairs in
our cluster take a long time. To get an idea of our current setup here
are some numbers:

our cluster currently consists of 4 nodes (replication factor 3).
these nodes are all on dedicated physical hardware in our own
datacenter. all of the nodes have

32 cores @2,9Ghz
64 GB ram
2 ssds (raid0) 900 GB each for data
1 seperate hdd for OS + commitlogs

current dataset:
approx 530 GB per node
21 tables (biggest one has more than 200 GB / node)


i already tried setting compactionthroughput + streamingthroughput to
unlimited for testing purposes ... but that did not change anything.

when checking system resources i cannot see any bottleneck (cpus are
pretty idle and we have no iowaits).

when issuing a repair via

nodetool repair -local on a node the repair takes longer than a day.
is this normal or could we normally expect a faster repair?

i also recognized that initalizing of new nodes in the datacenter was
really slow (approx 50 mbit/s). also here i expected a much better
performance - could those 2 problems be somehow related?

br//
roland