Purge data from repair_history table?
Hi, I've discovered a relative huge size of data in the system_distributed keyspace's repair_history table: Table: repair_history Space used (live): 389409804 Space used (total): 389409804 What is the purpose of this data? There is any safe method to purge? :) Bye, Gábor Auth
Re: Slow repair
Hi, On Wed, Mar 15, 2017 at 11:35 AM Ben Slater wrote: > When you say you’re running repair to “rebalance” do you mean to populate > the new DC? If so, the normal/correct procedure is to use nodetool rebuild > rather than repair. > Oh, thank you! :) Bye, Gábor Auth >
Re: Purge data from repair_history table?
It's safe to truncate this table since it's just used to inspect repairs for troubleshooting. You may also set a default TTL to avoid it from growing unbounded (this is going to be done by default on CASSANDRA-12701). 2017-03-17 8:36 GMT-03:00 Gábor Auth : > Hi, > > I've discovered a relative huge size of data in the system_distributed > keyspace's repair_history table: >Table: repair_history >Space used (live): 389409804 >Space used (total): 389409804 > > What is the purpose of this data? There is any safe method to purge? :) > > Bye, > Gábor Auth > >
Very odd & inconsistent results from SASI query
Cassandra 3.9, 4 nodes, rf=3 Hi folks, we're see 0 results returned from queries that (a) should return results, and (b) will return results with minor tweaks. I've attached the sanitized trace outputs for the following 3 queries (pk1 and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed non-key column): Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 ALLOW FILTERING; Q1 works - it returns a list of records, one of which has val1='abcdefghijklmn'. Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefghi%' LIMIT 1001 ALLOW FILTERING; Q2 does not work - 0 results returned. Only difference to Q1 is one additional character provided in LIKE comparison. Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck2 <= '2017-03-17' AND val1 = 'abcdefghijklmn' LIMIT 1001 ALLOW FILTERING; Q3 does not work - 0 results returned. As I've written above, the data set *does* include a record with val1='abcdefghijklmn'. Confounding the issue is that this behavior is inconsistent. For different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not. Now, that particular behavior I could explain with index/like problems, but it is Q3 that sometimes does not work and that's a simply equality comparison (although still using the index). Further confounding the issue is that if my testers run these same queries with the same parameters tomorrow, they're likely to work correctly. Only thing I've been able to glean from tracing execution is that the queries that work follow "Executing read..." with "Executing single partition query on t1" and so forth, whereas the queries that don't work simply follow "Executing read..." with "Read 0 live and 0 tombstone cells" with no actual work seemingly done. But that's not helping me narrow this down much. Thanks for your time - appreciate any help. Results found query (which include record where val='abcdefghijklmn'): Parsing SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 ALLOW FILTERING; [Native-Transport-Requests-1] Preparing statement [Native-Transport-Requests-1] Index mean cardinalities are idx_my_idx:-9223372036854775808. Scanning with idx_my_idx. [Native-Transport-Requests-1] Computing ranges to query [Native-Transport-Requests-1] Submitting range requests on 1 ranges with a concurrency of 1 (-1.08086395E16 rows per range expected) [Native-Transport-Requests-1] Submitted 1 concurrent range requests [Native-Transport-Requests-1] Executing read on keyspace.t1 using index idx_my_idx [ReadStage-2] Executing single-partition query on t1 [ReadStage-2] Acquiring sstable references [ReadStage-2] Key cache hit for sstable 2223 [ReadStage-2] Skipped 34/35 non-slice-intersecting sstables, included 1 due to tombstones [ReadStage-2] Key cache hit for sstable 2221 [ReadStage-2] Merged data from memtables and 2 sstables [ReadStage-2] Read 1 live and 0 tombstone cells [ReadStage-2] Re
Re: Purge data from repair_history table?
Oh, thanks! :) On Fri, 17 Mar 2017, 14:22 Paulo Motta, wrote: > It's safe to truncate this table since it's just used to inspect repairs > for troubleshooting. You may also set a default TTL to avoid it from > growing unbounded (this is going to be done by default on CASSANDRA-12701). > > 2017-03-17 8:36 GMT-03:00 Gábor Auth : > > Hi, > > I've discovered a relative huge size of data in the system_distributed > keyspace's repair_history table: >Table: repair_history >Space used (live): 389409804 >Space used (total): 389409804 > > What is the purpose of this data? There is any safe method to purge? :) > > Bye, > Gábor Auth > > >
Random slow read times in Cassandra
I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has consistently random high read times. In general most reads are under 10 milliseconds but with in the 30 request there is usually a read time that is a couple of seconds. Instance type: r4.8xlarge EBS GP2 volumes, 3tb with 9000 IOPS 30 Gig Heap Data per node is about 170 gigs The keyspace is an id & a blob. When I check the data the slow reads don’t seem to have anything to do with size of the blobs This system has repairs run once a weeks because it takes a lot of updates. The client makes a call and does 30 request serially to Cassandra and the response times look like this in milliseconds. What could make these so slow and what can I do to diagnosis this? Responses Get Person time: 3 319746229:9009:66 Get Person time: 7 1830093695:9009:66 Get Person time: 4 30072253:9009:66 Get Person time: 4 2303790089:9009:66 Get Person time: 2 156792066:9009:66 Get Person time: 8 491230624:9009:66 Get Person time: 7 284904599:9009:66 Get Person time: 4 600370489:9009:66 Get Person time: 2 281007386:9009:66 Get Person time: 4 971178094:9009:66 Get Person time: 1 1322259885:9009:66 Get Person time: 2 1937958542:9009:66 Get Person time: 9 286536648:9009:66 Get Person time: 9 1835633470:9009:66 Get Person time: 2 300867513:9009:66 Get Person time: 3 178975468:9009:66 Get Person time: 2900 293043081:9009:66 Get Person time: 8 214913830:9009:66 Get Person time: 2 1956710764:9009:66 Get Person time: 4 237673776:9009:66 Get Person time: 17 68942206:9009:66 Get Person time: 1800 20072145:9009:66 Get Person time: 2 304698506:9009:66 Get Person time: 2 308177320:9009:66 Get Person time: 2 998436038:9009:66 Get Person time: 10 1036890112:9009:66 Get Person time: 1 1629649548:9009:66 Get Person time: 6 1595339706:9009:66 Get Person time: 4 1079637599:9009:66 Get Person time: 3 556342855:9009:66 Get Person time: 5 1856382256:9009:66 Get Person time: 3 1891737174:9009:66 Get Person time: 2 1179373651:9009:66 Get Person time: 2 1482602756:9009:66 Get Person time: 3 1236458510:9009:66 Get Person time: 11 1003159823:9009:66 Get Person time: 2 1264952556:9009:66 Get Person time: 2 1662234295:9009:66 Get Person time: 1 246108569:9009:66 Get Person time: 5 1709881651:9009:66 Get Person time: 3213 11878078:9009:66 Get Person time: 2 112866483:9009:66 Get Person time: 2 201870153:9009:66 Get Person time: 6 227696684:9009:66 Get Person time: 2 1946780190:9009:66 Get Person time: 2 2197987101:9009:66 Get Person time: 18 1838959725:9009:66 Get Person time: 3 1782937802:9009:66 Get Person time: 3 1692530939:9009:66 Get Person time: 9 1765654196:9009:66 Get Person time: 2 1597757121:9009:66 Get Person time: 2 1853127153:9009:66 Get Person time: 3 1533599253:9009:66 Get Person time: 6 1693244112:9009:66 Get Person time: 6 82047537:9009:66 Get Person time: 2 96221961:9009:66 Get Person time: 4 98202209:9009:66 Get Person time: 9 12952388:9009:66 Get Person time: 2 300118652:9009:66 Get Person time: 10 78801084:9009:66 Get Person time: 13 1856424913:9009:66 Get Person time: 2 255814186:9009:66 Get Person time: 2 1183397424:9009:66 Get Person time: 5 1828603730:9009:66 Get Person time: 9 132965919:9009:66 Get Person time: 4 1616190071:9009:66 Get Person time: 2 15929337:9009:66 Get Person time: 10 297005427:9009:66 Get Person time: 2 1306460047:9009:66 Get Person time: 5 620139216:9009:66 Get Person time: 2 1364349058:9009:66 Get Person time: 3 629543403:9009:66 Get Person time: 5 1299827034:9009:66 Get Person time: 4 1593205912:9009:66 Get Person time: 2 1755460077:9009:66 Get Person time: 2 1906388666:9009:66 Get Person time: 1 1838653952:9009:66 Get Person time: 2 2249662508:9009:66 Get Person time: 3 1931708432:9009:66 Get Person time: 2 2177004948:9009:66 Get Person time: 2 2042756682:9009:66 Get Person time: 5 41764865:9009:66 Get Person time: 4023 1733384704:9009:66 Get Person time: 1 1614842189:9009:66 Get Person time: 2 2194211396:9009:66 Get Person time: 3 1711330834:9009:66 Get Person time: 2 2264849689:9009:66 Get Person time: 3 1819027970:9009:66 Get Person time: 2 1978614851:9009:66 Get Person time: 1 1863483129:9009:66
Re: Random slow read times in Cassandra
Probably Jvm pauses. Check your logs for long GC times. On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds wrote: > I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has > consistently random high read times. In general most reads are under 10 > milliseconds but with in the 30 request there is usually a read time that > is a couple of seconds. > > > > Instance type: r4.8xlarge > > EBS GP2 volumes, 3tb with 9000 IOPS > > 30 Gig Heap > > > > Data per node is about 170 gigs > > > > The keyspace is an id & a blob. When I check the data the slow reads > don’t seem to have anything to do with size of the blobs > > > > This system has repairs run once a weeks because it takes a lot of updates. > > > > The client makes a call and does 30 request serially to Cassandra and the > response times look like this in milliseconds. > > > > What could make these so slow and what can I do to diagnosis this? > > > > > > *Responses* > > > > Get Person time: 3 319746229:9009:66 > > Get Person time: 7 1830093695:9009:66 > > Get Person time: 4 30072253:9009:66 > > Get Person time: 4 2303790089:9009:66 > > Get Person time: 2 156792066:9009:66 > > Get Person time: 8 491230624:9009:66 > > Get Person time: 7 284904599:9009:66 > > Get Person time: 4 600370489:9009:66 > > Get Person time: 2 281007386:9009:66 > > Get Person time: 4 971178094:9009:66 > > Get Person time: 1 1322259885:9009:66 > > Get Person time: 2 1937958542:9009:66 > > Get Person time: 9 286536648:9009:66 > > Get Person time: 9 1835633470:9009:66 > > Get Person time: 2 300867513:9009:66 > > Get Person time: 3 178975468:9009:66 > > Get Person time: 2900 293043081:9009:66 > > Get Person time: 8 214913830:9009:66 > > Get Person time: 2 1956710764:9009:66 > > Get Person time: 4 237673776:9009:66 > > Get Person time: 17 68942206:9009:66 > > Get Person time: 1800 20072145:9009:66 > > Get Person time: 2 304698506:9009:66 > > Get Person time: 2 308177320:9009:66 > > Get Person time: 2 998436038:9009:66 > > Get Person time: 10 1036890112:9009:66 > > Get Person time: 1 1629649548:9009:66 > > Get Person time: 6 1595339706:9009:66 > > Get Person time: 4 1079637599:9009:66 > > Get Person time: 3 556342855:9009:66 > > > > > > Get Person time: 5 1856382256:9009:66 > > Get Person time: 3 1891737174:9009:66 > > Get Person time: 2 1179373651:9009:66 > > Get Person time: 2 1482602756:9009:66 > > Get Person time: 3 1236458510:9009:66 > > Get Person time: 11 1003159823:9009:66 > > Get Person time: 2 1264952556:9009:66 > > Get Person time: 2 1662234295:9009:66 > > Get Person time: 1 246108569:9009:66 > > Get Person time: 5 1709881651:9009:66 > > Get Person time: 3213 11878078:9009:66 > > Get Person time: 2 112866483:9009:66 > > Get Person time: 2 201870153:9009:66 > > Get Person time: 6 227696684:9009:66 > > Get Person time: 2 1946780190:9009:66 > > Get Person time: 2 2197987101:9009:66 > > Get Person time: 18 1838959725:9009:66 > > Get Person time: 3 1782937802:9009:66 > > Get Person time: 3 1692530939:9009:66 > > Get Person time: 9 1765654196:9009:66 > > Get Person time: 2 1597757121:9009:66 > > Get Person time: 2 1853127153:9009:66 > > Get Person time: 3 1533599253:9009:66 > > Get Person time: 6 1693244112:9009:66 > > Get Person time: 6 82047537:9009:66 > > Get Person time: 2 96221961:9009:66 > > Get Person time: 4 98202209:9009:66 > > Get Person time: 9 12952388:9009:66 > > Get Person time: 2 300118652:9009:66 > > Get Person time: 10 78801084:9009:66 > > > > > > Get Person time: 13 1856424913:9009:66 > > Get Person time: 2 255814186:9009:66 > > Get Person time: 2 1183397424:9009:66 > > Get Person time: 5 1828603730:9009:66 > > Get Person time: 9 132965919:9009:66 > > Get Person time: 4 1616190071:9009:66 > > Get Person time: 2 15929337:9009:66 > > Get Person time: 10 297005427:9009:66 > > Get Person time: 2 1306460047:9009:66 > > Get Person time: 5 620139216:9009:66 > > Get Person time: 2 1364349058:9009:66 > > Get Person time: 3 629543403:9009:66 > > Get Person time: 5 1299827034:9009:66 > > Get Person time: 4 1593205912:9009:66 > > Get Person time: 2 1755460077:9009:66 > > Get Person time: 2 1906388666:9009:66 > > Get Person time: 1 1838653952:9009:66 > > Get Person time: 2 2249662508:9009:66 > > Get Person time: 3 1931708432:9009:66 > > Get Person time: 2 2177004948:9009:66 > > Get Person time: 2 2042756682:9009:66 > > Get Person time: 5 41764865:9009:66 > > Get Person time: 4023 1733384704:9009:66 > > Get Person time: 1 1614842189:9009:66 > > Get Person time: 2 2194211396:9009:66 > > Get Person time: 3 1711330834:9009:66 > > Get Person time: 2 2264849689:9009:66 > > Get Person time: 3 1819027970:9009:66 > > Get Person time: 2 1978614851:9009:66 > > Get Person time: 1 1863483129:9009:66 > > >
Re: Very odd & inconsistent results from SASI query
A wrinkle further confounds the issue: running a repair on the node which was servicing the queries has cleared things up and all the queries now work. That doesn't make a whole lot of sense to me - my assumption was that a repair shouldn't have fixed it. On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnot wrote: > Cassandra 3.9, 4 nodes, rf=3 > > Hi folks, we're see 0 results returned from queries that (a) should return > results, and (b) will return results with minor tweaks. > > I've attached the sanitized trace outputs for the following 3 queries (pk1 > and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed > non-key column): > > Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 > ALLOW FILTERING; > Q1 works - it returns a list of records, one of which has > val1='abcdefghijklmn'. > > Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefghi%' LIMIT > 1001 ALLOW FILTERING; > Q2 does not work - 0 results returned. Only difference to Q1 is one > additional character provided in LIKE comparison. > > Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck2 <= '2017-03-17' AND val1 = 'abcdefghijklmn' LIMIT > 1001 ALLOW FILTERING; > Q3 does not work - 0 results returned. > > As I've written above, the data set *does* include a record with > val1='abcdefghijklmn'. > > Confounding the issue is that this behavior is inconsistent. For > different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 > do not. Now, that particular behavior I could explain with index/like > problems, but it is Q3 that sometimes does not work and that's a simply > equality comparison (although still using the index). > > Further confounding the issue is that if my testers run these same queries > with the same parameters tomorrow, they're likely to work correctly. > > Only thing I've been able to glean from tracing execution is that the > queries that work follow "Executing read..." with "Executing single > partition query on t1" and so forth, whereas the queries that don't work > simply follow "Executing read..." with "Read 0 live and 0 tombstone cells" > with no actual work seemingly done. But that's not helping me narrow this > down much. > > Thanks for your time - appreciate any help. >
Re: Random slow read times in Cassandra
check for level 2 (stop the world) garbage collections. *...* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds wrote: > I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has > consistently random high read times. In general most reads are under 10 > milliseconds but with in the 30 request there is usually a read time that > is a couple of seconds. > > > > Instance type: r4.8xlarge > > EBS GP2 volumes, 3tb with 9000 IOPS > > 30 Gig Heap > > > > Data per node is about 170 gigs > > > > The keyspace is an id & a blob. When I check the data the slow reads > don’t seem to have anything to do with size of the blobs > > > > This system has repairs run once a weeks because it takes a lot of updates. > > > > The client makes a call and does 30 request serially to Cassandra and the > response times look like this in milliseconds. > > > > What could make these so slow and what can I do to diagnosis this? > > > > > > *Responses* > > > > Get Person time: 3 319746229:9009:66 > > Get Person time: 7 1830093695:9009:66 > > Get Person time: 4 30072253:9009:66 > > Get Person time: 4 2303790089:9009:66 > > Get Person time: 2 156792066:9009:66 > > Get Person time: 8 491230624:9009:66 > > Get Person time: 7 284904599:9009:66 > > Get Person time: 4 600370489:9009:66 > > Get Person time: 2 281007386:9009:66 > > Get Person time: 4 971178094:9009:66 > > Get Person time: 1 1322259885:9009:66 > > Get Person time: 2 1937958542:9009:66 > > Get Person time: 9 286536648:9009:66 > > Get Person time: 9 1835633470:9009:66 > > Get Person time: 2 300867513:9009:66 > > Get Person time: 3 178975468:9009:66 > > Get Person time: 2900 293043081:9009:66 > > Get Person time: 8 214913830:9009:66 > > Get Person time: 2 1956710764:9009:66 > > Get Person time: 4 237673776:9009:66 > > Get Person time: 17 68942206:9009:66 > > Get Person time: 1800 20072145:9009:66 > > Get Person time: 2 304698506:9009:66 > > Get Person time: 2 308177320:9009:66 > > Get Person time: 2 998436038:9009:66 > > Get Person time: 10 1036890112:9009:66 > > Get Person time: 1 1629649548:9009:66 > > Get Person time: 6 1595339706:9009:66 > > Get Person time: 4 1079637599:9009:66 > > Get Person time: 3 556342855:9009:66 > > > > > > Get Person time: 5 1856382256:9009:66 > > Get Person time: 3 1891737174:9009:66 > > Get Person time: 2 1179373651:9009:66 > > Get Person time: 2 1482602756:9009:66 > > Get Person time: 3 1236458510:9009:66 > > Get Person time: 11 1003159823:9009:66 > > Get Person time: 2 1264952556:9009:66 > > Get Person time: 2 1662234295:9009:66 > > Get Person time: 1 246108569:9009:66 > > Get Person time: 5 1709881651:9009:66 > > Get Person time: 3213 11878078:9009:66 > > Get Person time: 2 112866483:9009:66 > > Get Person time: 2 201870153:9009:66 > > Get Person time: 6 227696684:9009:66 > > Get Person time: 2 1946780190:9009:66 > > Get Person time: 2 2197987101 <(219)%20798-7101>:9009:66 > > Get Person time: 18 1838959725:9009:66 > > Get Person time: 3 1782937802:9009:66 > > Get Person time: 3 1692530939:9009:66 > > Get Person time: 9 1765654196:9009:66 > > Get Person time: 2 1597757121:9009:66 > > Get Person time: 2 1853127153:9009:66 > > Get Person time: 3 1533599253:9009:66 > > Get Person time: 6 1693244112:9009:66 > > Get Person time: 6 82047537:9009:66 > > Get Person time: 2 96221961:9009:66 > > Get Person time: 4 98202209:9009:66 > > Get Person time: 9 12952388:9009:66 > > Get Person time: 2 300118652:9009:66 > > Get Person time: 10 78801084:9009:66 > > > > > > Get Person time: 13 1856424913:9009:66 > > Get Person time: 2 255814186:9009:66 > > Get Person time: 2 1183397424:9009:66 > > Get Person time: 5 1828603730:9009:66 > > Get Person time: 9 132965919:9009:66 > > Get Person time: 4 1616190071:9009:66 > > Get Person time: 2 15929337:9009:66 > > Get Person time: 10 297005427:9009:66 > > Get Person time: 2 1306460047:9009:66 > > Get Person time: 5 620139216:9009:66 > > Get Person time: 2 1364349058:9009:66 > > Get Person time: 3 629543403:9009:66 > > Get Person time: 5 1299827034:9009:66 > > Get Person time: 4 1593205912:9009:66 > > Get Person time: 2 1755460077:9009:66 > > Get Person time: 2 1906388666:9009:66 > > Get Person time: 1 1838653952:9009:66 > > Get Person time: 2 2249662508 <(224)%20966-2508>:9009:66 > > Get Person time: 3 1931708432:9009:66 > > Get Person time: 2 2177004948 <(217)%20700-4948>:9009:66 > > Get Person time: 2 2042756682 <(204)%20275-6682>:9009:66 > > Get Person time: 5 41764865:9009:66 > > Get Person time: 4023 1733384704:9009:66 > > Get Person time: 1 1614842189:9009:66 > > Get Person time: 2 2194211396 <(219)%20421-1396>:9009:66 > > Get Person time: 3 1711330834:9009:66 > > Get Person time: 2 2264849689 <(226)%20484-9689>:9009:66 > > Get Person time: 3 1819027970:9009:66 > > Get Person time: 2 1978614851:9009:66 > > Get Person time: 1 1863483129:9009:66 > > >
repair performance
hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta : > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with a new cluster we built up for getting familiar with > cassandra and its possibilites. > > while getting familiar with that topic we recognized that repairs in > our cluster take a long time. To get an idea of our current setup here > are some numbers: > > our cluster currently consists of 4 nodes (replication factor 3). > these nodes are all on dedicated physical hardware in our own > datacenter. all of the nodes have > > 32 cores @2,9Ghz > 64 GB ram > 2 ssds (raid0) 900 GB each for data > 1 seperate hdd for OS + commitlogs > > current dataset: > approx 530 GB per node > 21 tables (biggest one has more than 200 GB / node) > > > i already tried setting compactionthroughput + streamingthroughput to > unlimited for testing purposes ... but that did not change anything. > > when checking system resources i cannot see any bottleneck (cpus are > pretty idle and we have no iowaits). > > when issuing a repair via > > nodetool repair -local on a node the repair takes longer than a day. > is this normal or could we normally expect a faster repair? > > i also recognized that initalizing of new nodes in the datacenter was > really slow (approx 50 mbit/s). also here i expected a much better > performance - could those 2 problems be somehow related? > > br// > roland
Re: repair performance
forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
... maybe i should just try increasing the job threads with --job-threads shame on me On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland
Re: repair performance
The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta : > forgot to mention the version we are using: > > we are using 3.0.7 - so i guess we should have incremental repairs by > default. > it also prints out incremental:true when starting a repair > INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - > Starting repair command #7, repairing keyspace xxx with repair options > (parallelism: parallel, primary range: false, incremental: true, job > threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of > ranges: 1758) > > 3.0.7 is also the reason why we are not using reaper ... as far as i could > figure out it's not compatible with 3.0+ > > > > On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: > > It depends a lot ... > > - Repairs can be very slow, yes! (And unreliable, due to timeouts, > outages, whatever) > - You can use incremental repairs to speed things up for regular repairs > - You can use "reaper" to schedule repairs and run them sliced, automated, > failsafe > > The time repairs actually may vary a lot depending on how much data has to > be streamed or how inconsistent your cluster is. > > 50mbit/s is really a bit low! The actual performance depends on so many > factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old > nodes" of the cluster. > This is a quite individual problem you have to track down individually. > > 2017-03-17 22:07 GMT+01:00 Roland Otta : > > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with a new cluster we built up for getting familiar with > cassandra and its possibilites. > > while getting familiar with that topic we recognized that repairs in > our cluster take a long time. To get an idea of our current setup here > are some numbers: > > our cluster currently consists of 4 nodes (replication factor 3). > these nodes are all on dedicated physical hardware in our own > datacenter. all of the nodes have > > 32 cores @2,9Ghz > 64 GB ram > 2 ssds (raid0) 900 GB each for data > 1 seperate hdd for OS + commitlogs > > current dataset: > approx 530 GB per node > 21 tables (biggest one has more than 200 GB / node) > > > i already tried setting compactionthroughput + streamingthroughput to > unlimited for testing purposes ... but that did not change anything. > > when checking system resources i cannot see any bottleneck (cpus are > pretty idle and we have no iowaits). > > when issuing a repair via > > nodetool repair -local on a node the repair takes longer than a day. > is this normal or could we normally expect a faster repair? > > i also recognized that initalizing of new nodes in the datacenter was > really slow (approx 50 mbit/s). also here i expected a much better > performance - could those 2 problems be somehow related? > > br// > roland > > >
Does SASI index support IN?
All, I've been experimenting with Cassandra 3.10 now, with the hope that SASI has improved. To much disappointment, it seems it still doesn't support simple operation like IN. Have others tried the same? Also with a small test data set (160K records), the performance is also not better than just doing without the index (using allow filtering). Very confused what the index really do? Thanks, John NOTICE OF CONFIDENTIALITY: This message may contain information that is considered confidential and which may be prohibited from disclosure under applicable law or by contractual agreement. The information is intended solely for the use of the individual or entity named above. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the information contained in or attached to this message is strictly prohibited. If you have received this email transmission in error, please notify the sender by replying to this email and then delete it from your system.
Re: repair performance
did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta mailto:roland.o...@willhaben.at>>: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspace xxx with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [ProdDC2], hosts: [], # of ranges: 1758) 3.0.7 is also the reason why we are not using reaper ... as far as i could figure out it's not compatible with 3.0+ On Fri, 2017-03-17 at 22:13 +0100, benjamin roth wrote: It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may vary a lot depending on how much data has to be streamed or how inconsistent your cluster is. 50mbit/s is really a bit low! The actual performance depends on so many factors like your CPU, RAM, HD/SSD, concurrency settings, load of the "old nodes" of the cluster. This is a quite individual problem you have to track down individually. 2017-03-17 22:07 GMT+01:00 Roland Otta mailto:roland.o...@willhaben.at>>: hello, we are quite inexperienced with cassandra at the moment and are playing around with a new cluster we built up for getting familiar with cassandra and its possibilites. while getting familiar with that topic we recognized that repairs in our cluster take a long time. To get an idea of our current setup here are some numbers: our cluster currently consists of 4 nodes (replication factor 3). these nodes are all on dedicated physical hardware in our own datacenter. all of the nodes have 32 cores @2,9Ghz 64 GB ram 2 ssds (raid0) 900 GB each for data 1 seperate hdd for OS + commitlogs current dataset: approx 530 GB per node 21 tables (biggest one has more than 200 GB / node) i already tried setting compactionthroughput + streamingthroughput to unlimited for testing purposes ... but that did not change anything. when checking system resources i cannot see any bottleneck (cpus are pretty idle and we have no iowaits). when issuing a repair via nodetool repair -local on a node the repair takes longer than a day. is this normal or could we normally expect a faster repair? i also recognized that initalizing of new nodes in the datacenter was really slow (approx 50 mbit/s). also here i expected a much better performance - could those 2 problems be somehow related? br// roland