[jira] [Commented] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549268#comment-14549268 ] Jordan West commented on CASSANDRA-9406: patch here (base is cassandra-2.0): https://github.com/jrwest/cassandra/tree/9406 Add Option to Not Validate Atoms During Scrub - Key: CASSANDRA-9406 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jordan West Assignee: Jordan West Priority: Minor Fix For: 2.0.x In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData to true. This should be made configurable when running scrub via JMX or StandaloneScrubber. Since inbound data is not validated, Scrub without this option will throw away data that is not corrupt, but misrepresented (e.g. an int is stored but validator = LongType), while Cassandra and application clients will happily continue to read and write data with this misrepresentation (although some care may need to be taken on the application side). Scrub will throw these rows out leading to a large amount of data loss. In these applications it is desirable for scrub to check for row/file corruption but not validate the column values (which can result in a large percentage of data being thrown away). This would be made possible by adding such a flag to disable validation in the SSTableIdentityIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub
Jordan West created CASSANDRA-9406: -- Summary: Add Option to Not Validate Atoms During Scrub Key: CASSANDRA-9406 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jordan West Priority: Minor Fix For: 2.0.x In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData to true. This should be made configurable when running scrub via JMX or StandaloneScrubber. Since inbound data is not validated, Scrub without this option will throw away data that is not corrupt, but misrepresented (e.g. an int is stored but validator = LongType), while Cassandra and application clients will happily continue to read and write data with this misrepresentation (although some care may need to be taken on the application side). Scrub will throw these rows out leading to a large amount of data loss. In these applications it is desirable for scrub to check for row/file corruption but not validate the column values (which can result in a large percentage of data being thrown away). This would be made possible by adding such a flag to disable validation in the SSTableIdentityIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West reassigned CASSANDRA-9406: -- Assignee: Jordan West Add Option to Not Validate Atoms During Scrub - Key: CASSANDRA-9406 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jordan West Assignee: Jordan West Priority: Minor Fix For: 2.0.x In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData to true. This should be made configurable when running scrub via JMX or StandaloneScrubber. Since inbound data is not validated, Scrub without this option will throw away data that is not corrupt, but misrepresented (e.g. an int is stored but validator = LongType), while Cassandra and application clients will happily continue to read and write data with this misrepresentation (although some care may need to be taken on the application side). Scrub will throw these rows out leading to a large amount of data loss. In these applications it is desirable for scrub to check for row/file corruption but not validate the column values (which can result in a large percentage of data being thrown away). This would be made possible by adding such a flag to disable validation in the SSTableIdentityIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10681) make index building pluggable via IndexBuildTask
[ https://issues.apache.org/jira/browse/CASSANDRA-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018508#comment-15018508 ] Jordan West commented on CASSANDRA-10681: - I prefer the version that makes it explicit that some operations may be performed across the entire table data set (indexes/sstables) but if we are not going that route I would vote we use the other patch Pavel posted, which avoids the unnecessary unused instance construction. Also, is there a need to introduce a new `IndexBuildTask class` or can we just use `SecondaryIndexBuilder` as the interface and subclass it? Is there a concern about re-using the existing class but making it abstract? > make index building pluggable via IndexBuildTask > > > Key: CASSANDRA-10681 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10681 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich >Priority: Minor > Labels: sasi > Fix For: 3.x > > Attachments: 0001-add-table-support-for-multi-table-builds.patch, > 0001-make-index-building-pluggable-via-IndexBuildTask.patch > > > Currently index building assumes one and only way to build all of the indexes > - through SecondaryIndexBuilder - which merges all of the sstables together, > collates columns etc. Such works fine for built-in indexes but not for SASI > since it's attaches to every SSTable individually. We need a "IndexBuildTask" > interface (based on CompactionInfo.Holder) to be returned from Index on > demand to give power to SI interface implementers to decide how build should > work. This might be less effective for CassandraIndex, since this effectively > means that collation will have to be done multiple times on the same data, > but nevertheless is a good compromise for clean interface to outside world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10681) make index building pluggable via IndexBuildTask
[ https://issues.apache.org/jira/browse/CASSANDRA-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018508#comment-15018508 ] Jordan West edited comment on CASSANDRA-10681 at 11/20/15 6:52 PM: --- I prefer the version that makes it explicit that some operations may be performed across the entire table data set (indexes/sstables) but if we are not going that route I would vote we use the other patch Pavel posted, which avoids the unnecessary unused instance construction. Also, is there a need to introduce a new `IndexBuildTask` class or can we just use `SecondaryIndexBuilder` as the interface and subclass it? Is there a concern about re-using the existing class but making it abstract? was (Author: jrwest): I prefer the version that makes it explicit that some operations may be performed across the entire table data set (indexes/sstables) but if we are not going that route I would vote we use the other patch Pavel posted, which avoids the unnecessary unused instance construction. Also, is there a need to introduce a new `IndexBuildTask class` or can we just use `SecondaryIndexBuilder` as the interface and subclass it? Is there a concern about re-using the existing class but making it abstract? > make index building pluggable via IndexBuildTask > > > Key: CASSANDRA-10681 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10681 > Project: Cassandra > Issue Type: Sub-task > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich >Priority: Minor > Labels: sasi > Fix For: 3.x > > Attachments: 0001-add-table-support-for-multi-table-builds.patch, > 0001-make-index-building-pluggable-via-IndexBuildTask.patch > > > Currently index building assumes one and only way to build all of the indexes > - through SecondaryIndexBuilder - which merges all of the sstables together, > collates columns etc. Such works fine for built-in indexes but not for SASI > since it's attaches to every SSTable individually. We need a "IndexBuildTask" > interface (based on CompactionInfo.Holder) to be returned from Index on > demand to give power to SI interface implementers to decide how build should > work. This might be less effective for CassandraIndex, since this effectively > means that collation will have to be done multiple times on the same data, > but nevertheless is a good compromise for clean interface to outside world. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results
[ https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347099#comment-15347099 ] Jordan West commented on CASSANDRA-12073: - [~doanduyhai] the fix you mentioned looks correct. it should only impact the [code]PREFIX[/code] type because of the implementation of `nonMatchingPartial`. Unfortunately this means we may do a bit more work in the index than desired but this is an acceptable consequence of the more complex query. > [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial > results > --- > > Key: CASSANDRA-12073 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12073 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Cassandra 3.7 >Reporter: DOAN DuyHai > > {noformat} > cqlsh:music> CREATE TABLE music.albums ( > id uuid PRIMARY KEY, > artist text, > country text, > quality text, > status text, > title text, > year int > ); > cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) > USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': > 'CONTAINS', 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'}; > cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%' LIMIT 100; > id | artist| country| quality > | status| title | year > --+---++-+---+---+-- > 372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA | normal > | Official | Red and Blue EP | 2006 > 1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown | normal > | Promotion |Poker Face | 2008 > 31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA | normal > | Official | The Cherrytree Sessions | 2009 > 8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown | normal > | Official |Poker Face | 2009 > 98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA | normal > | Official | Just Dance: The Remixes | 2008 > a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga | Italy | normal > | Official | The Fame | 2008 > 849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA | normal > | Official |Christmas Tree | 2008 > 4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga | Australia | normal > | Official | Eh Eh | 2009 > 80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA | normal > | Official | Just Dance Remixes Part 2 | 2008 > 3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown | normal > | null |Just Dance | 2008 > 9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom | normal > | Official |Just Dance | 2009 > (11 rows) > {noformat} > *SASI* says that there are only 11 artists whose name starts with {{lady}}. > However, in the data set, there are: > * Lady Pank > * Lady Saw > * Lady Saw > * Ladyhawke > * Ladytron > * Ladysmith Black Mambazo > * Lady Gaga > * Lady Sovereign > etc ... > By debugging the source code, the issue is in > {{OnDiskIndex.TermIterator::computeNext()}} > {code:java} > for (;;) > { > if (currentBlock == null) > return endOfData(); > if (offset >= 0 && offset < currentBlock.termCount()) > { > DataTerm currentTerm = currentBlock.getTerm(nextOffset()); > if (checkLower && !e.isLowerSatisfiedBy(currentTerm)) > continue; > // flip the flag right on the first bounds match > // to avoid expensive comparisons > checkLower = false; > if (checkUpper && !e.isUpperSatisfiedBy(currentTerm)) > return endOfData(); > return currentTerm; > } > nextBlock(); > } > {code} > So the {{endOfData()}} conditions are: > * currentBlock == null > * checkUpper && !e.isUpperSatisfiedBy(currentTerm) > The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether > the term match but also returns *false* when it's a *partial term* ! > {code:java} > public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term) > { > if (!hasUpper()) > return true; > if (nonMatchingPartial(term)) > return false; > int cmp = term.compareTo(validator, upper.value, false); > return cmp
[jira] [Comment Edited] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results
[ https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347099#comment-15347099 ] Jordan West edited comment on CASSANDRA-12073 at 6/23/16 8:20 PM: -- [~doanduyhai] the fix you mentioned looks correct. it should only impact the PREFIX type because of the implementation of `nonMatchingPartial`. Unfortunately this means we may do a bit more work in the index than desired but this is an acceptable consequence of the more complex query. was (Author: jrwest): [~doanduyhai] the fix you mentioned looks correct. it should only impact the [code]PREFIX[/code] type because of the implementation of `nonMatchingPartial`. Unfortunately this means we may do a bit more work in the index than desired but this is an acceptable consequence of the more complex query. > [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial > results > --- > > Key: CASSANDRA-12073 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12073 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Cassandra 3.7 >Reporter: DOAN DuyHai > > {noformat} > cqlsh:music> CREATE TABLE music.albums ( > id uuid PRIMARY KEY, > artist text, > country text, > quality text, > status text, > title text, > year int > ); > cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) > USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': > 'CONTAINS', 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'}; > cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%' LIMIT 100; > id | artist| country| quality > | status| title | year > --+---++-+---+---+-- > 372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA | normal > | Official | Red and Blue EP | 2006 > 1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown | normal > | Promotion |Poker Face | 2008 > 31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA | normal > | Official | The Cherrytree Sessions | 2009 > 8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown | normal > | Official |Poker Face | 2009 > 98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA | normal > | Official | Just Dance: The Remixes | 2008 > a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga | Italy | normal > | Official | The Fame | 2008 > 849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA | normal > | Official |Christmas Tree | 2008 > 4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga | Australia | normal > | Official | Eh Eh | 2009 > 80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA | normal > | Official | Just Dance Remixes Part 2 | 2008 > 3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown | normal > | null |Just Dance | 2008 > 9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom | normal > | Official |Just Dance | 2009 > (11 rows) > {noformat} > *SASI* says that there are only 11 artists whose name starts with {{lady}}. > However, in the data set, there are: > * Lady Pank > * Lady Saw > * Lady Saw > * Ladyhawke > * Ladytron > * Ladysmith Black Mambazo > * Lady Gaga > * Lady Sovereign > etc ... > By debugging the source code, the issue is in > {{OnDiskIndex.TermIterator::computeNext()}} > {code:java} > for (;;) > { > if (currentBlock == null) > return endOfData(); > if (offset >= 0 && offset < currentBlock.termCount()) > { > DataTerm currentTerm = currentBlock.getTerm(nextOffset()); > if (checkLower && !e.isLowerSatisfiedBy(currentTerm)) > continue; > // flip the flag right on the first bounds match > // to avoid expensive comparisons > checkLower = false; > if (checkUpper && !e.isUpperSatisfiedBy(currentTerm)) > return endOfData(); > return currentTerm; > } > nextBlock(); > } > {code} > So the {{endOfData()}} conditions are: > * currentBlock == null > * checkUpper && !e.isUpperSatisfiedBy(currentTerm) > The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether > the
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918 ] Jordan West commented on CASSANDRA-10661: - bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918 ] Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM: -- bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, `=` would mean exact match and CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. was (Author: jrwest): bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
[ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117681#comment-15117681 ] Jordan West commented on CASSANDRA-11067: - bq. other than the fact that is new and experimental and unproven in the real world yet? I think SASI is as, or more, proven than any change in the 3.x releases. Its been in production for longer than any 3.x feature and most of the changes for 3.x were surface-level integration changes as [~xedin] mentioned. bq. The fact that a SASI index needs to be "CUSTOM" and an explicit class name is needed feels a little hokey to me. Agreed but we decided not to change this to ease the merge and because the sstable's format is not extendable easily, currently. Also, this the case for any non-default index class. I think it would be great to SASI become the default implementation or to have an easier way to specify which implementation to use. > Improve SASI syntax > --- > > Key: CASSANDRA-11067 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11067 > Project: Cassandra > Issue Type: Task > Components: CQL >Reporter: Jonathan Ellis >Assignee: Pavel Yaskevich > Fix For: 3.4 > > > I think everyone agrees that a LIKE operator would be ideal, but that's > probably not in scope for an initial 3.4 release. > Still, I'm uncomfortable with the initial approach of overloading = to mean > "satisfies index expression." The problem is that it will be very difficult > to back out of this behavior once people are using it. > I propose adding a new operator in the interim instead. Call it MATCHES, > maybe. With the exact same behavior that SASI currently exposes, just with a > separate operator rather than being rolled into =. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11449) Add NOT LIKE for PREFIX/CONTAINS Mode SASI Indexes
Jordan West created CASSANDRA-11449: --- Summary: Add NOT LIKE for PREFIX/CONTAINS Mode SASI Indexes Key: CASSANDRA-11449 URL: https://issues.apache.org/jira/browse/CASSANDRA-11449 Project: Cassandra Issue Type: New Feature Components: sasi Reporter: Jordan West Assignee: Pavel Yaskevich Internally, SASI already supports {{NOT LIKE}} but the CQL3 layer and grammar need to be extended to support it. The same rules that apply to {{LIKE}} for {{PREFIX}} and {{CONTAINS}} modes would apply to {{NOT LIKE}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216186#comment-15216186 ] Jordan West commented on CASSANDRA-11383: - bq. Was the conclusion that a SPARSE SASI index would work well even for low cardinality data (as in the original reported case, for period_end_month_int), or was there some application-level change required to adapt to a SASI change as well? {{period_end_month_int}} is still the incorrect use case for {{SPARSE}}. That did not change. {{SPARSE}} is still intended for indexes/terms where there are a large number of terms and a low number of tokens/keys per term (the token trees in the index are sparse). The {{period_end_month_int}} use-case is a dense index: there are few terms and each term has a large number of tokens/keys (the token trees in the index are dense). The merged patch improves memory overhead in either case when building indexes from a large sstable. What was modified is that indexes marked {{SPARSE}} that have more than 5 tokens for any term in the index will fail to build and an exception will be logged. bq. Is it now official that a non-SPARSE SASI index (e.g., PREFIX) can be used for non-TEXT data (int in particular), at least for the case of exact match lookup? {{PREFIX}} mode has always been supported for numeric data and was/continues to be the default mode if none is specified. PREFIX mode should be considered "NOT SPARSE" for numerical data. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216337#comment-15216337 ] Jordan West commented on CASSANDRA-11383: - bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is considered advisable/recommended for high cardinality column data, where SPARSE mode is nominally a better choice. Maybe that is strictly a matter of whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is required, but if not, SPARSE mode sounds like the better choice. But I don't have a handle on the internal index structures to know if that's absolutely the case - that a PREFIX index for SPARSE data would necessarily be larger and/or slower than a SPARSE index for high cardinality data. I would hope so, but it would be good to have that confirmed. {{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that is sparse (every term/column value has less than 5 matching keys), such as indexing the {{created_at}} field in time series data (where there is typically few matching rows/events per {{created_at}} timestamp), it is best to use {{SPARSE}}, always, and especially in cases where range queries are used. {{SPARSE}} is primarily an optimization for range queries on this sort of data. Its biggest effect is visible on large ranges (e.g. spanning multiple days of time series data). The decision process for whether or not to use {{SPARSE}} should be: 1. is the data a numeric type? 2. is it expected that there will be a large (millions or more) number of terms (column values) in the index with each term having a small (5 or less) set of matching tokens (partition keys)? If the answer to both is Yes then use {{SPARSE}}. >From the docs >(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder): bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a TokenTree is built merging all the TokenTrees for each term into a single one. This copy of the data is used for efficient iteration of large ranges of e.g. timestamps. The index "mode" is configurable per column at index creation time. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216337#comment-15216337 ] Jordan West edited comment on CASSANDRA-11383 at 3/29/16 5:02 PM: -- bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is considered advisable/recommended for high cardinality column data, where SPARSE mode is nominally a better choice. Maybe that is strictly a matter of whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is required, but if not, SPARSE mode sounds like the better choice. But I don't have a handle on the internal index structures to know if that's absolutely the case - that a PREFIX index for SPARSE data would necessarily be larger and/or slower than a SPARSE index for high cardinality data. I would hope so, but it would be good to have that confirmed. {{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that is sparse (every term/column value has less than 5 matching keys), such as indexing the {{created_at}} field in time series data (where there is typically few matching rows/events per {{created_at}} timestamp), it is best to use {{SPARSE}}, always, and especially in cases where range queries are used. {{SPARSE}} is primarily an optimization for range queries on this sort of data. Its biggest effect is visible on large ranges (e.g. spanning multiple days of time series data). The decision process for whether or not to use {{SPARSE}} should be: 1. is the data a numeric type? 2. is it expected that there will be a large (millions or more) number of terms (column values) in the index with each term having a small (5 or less) set of matching tokens (partition keys)? 3. will range queries be performed against this index? If the answer to all three questions is Yes then use {{SPARSE}}. >From the docs >(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder): bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a TokenTree is built merging all the TokenTrees for each term into a single one. This copy of the data is used for efficient iteration of large ranges of e.g. timestamps. The index "mode" is configurable per column at index creation time. was (Author: jrwest): bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode is considered advisable/recommended for high cardinality column data, where SPARSE mode is nominally a better choice. Maybe that is strictly a matter of whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is required, but if not, SPARSE mode sounds like the better choice. But I don't have a handle on the internal index structures to know if that's absolutely the case - that a PREFIX index for SPARSE data would necessarily be larger and/or slower than a SPARSE index for high cardinality data. I would hope so, but it would be good to have that confirmed. {{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that is sparse (every term/column value has less than 5 matching keys), such as indexing the {{created_at}} field in time series data (where there is typically few matching rows/events per {{created_at}} timestamp), it is best to use {{SPARSE}}, always, and especially in cases where range queries are used. {{SPARSE}} is primarily an optimization for range queries on this sort of data. Its biggest effect is visible on large ranges (e.g. spanning multiple days of time series data). The decision process for whether or not to use {{SPARSE}} should be: 1. is the data a numeric type? 2. is it expected that there will be a large (millions or more) number of terms (column values) in the index with each term having a small (5 or less) set of matching tokens (partition keys)? If the answer to both is Yes then use {{SPARSE}}. >From the docs >(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder): bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a TokenTree is built merging all the TokenTrees for each term into a single one. This copy of the data is used for efficient iteration of large ranges of e.g. timestamps. The index "mode" is configurable per column at index creation time. > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, >
[jira] [Commented] (CASSANDRA-11434) Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting SA metadata per term
[ https://issues.apache.org/jira/browse/CASSANDRA-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215073#comment-15215073 ] Jordan West commented on CASSANDRA-11434: - The branch linked below implements the described changes. The test changes reflect the feature changes made. This is a backwards compatible change. It uses an unused (zeroed) byte in the index header to indicate if the index supports the new kind of query. Existing indexes will automatically be upgraded to support marked partials when compacted. PREFIX queries against a CONTAINS column whose indexes have not yet been upgraded will still result in an exception and failed request (but with a different exception than {{InvalidRequestException}}). Once the index is rebuilt (manually or via compaction) the exception will stop being thrown. ||branch||testall||dtest|| |[CASSANDRA-11434|https://github.com/xedin/cassandra/tree/CASSANDRA-11434]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11434-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11434-dtest/]| > Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting > SA metadata per term > -- > > Key: CASSANDRA-11434 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11434 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Pavel Yaskevich >Assignee: Jordan West > Fix For: 3.6 > > > We can support EQ/PREFIX requests to CONTAINS indexes by tracking > "partiality" of the data stored in the OnDiskIndex and IndexMemtable, if we > know exactly if current match represents part of the term or it's original > form it would be trivial to support EQ/PREFIX since PREFIX is subset of > SUFFIX matches. > Since we attach uint16 size to each term stored we can take advantage of sign > bit so size of the index is not impacted at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233209#comment-15233209 ] Jordan West commented on CASSANDRA-11525: - [~doanduyhai] we have tracked down the root cause of the bug and it has affected all versions of SASI since its original inclusion in Cassandra. The issue is that when positions in the -Index.db file are > Integer.MAX_VALUE the positions are factored into a 32-bit and 16-bit value. The 16-bit value was being read as a signed short and for certain positions this resulted in reconstructing an incorrect 64-bit offset from the 32-bit and 16-bit parts. Thankfully, this is a quick, one-line fix (reading the short as unsigned), and is entirely independent of the changes in CASSANDRA-11383 or this ticket. We will include the fix for this with the merge of the changes in this ticket. We are working on final verification using your SSTables before we merge. > StaticTokenTreeBuilder should respect posibility of duplicate tokens > > > Key: CASSANDRA-11525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11525 > Project: Cassandra > Issue Type: Bug > Components: sasi > Environment: Cassandra 3.5-SNAPSHOT >Reporter: DOAN DuyHai >Assignee: Jordan West > Fix For: 3.5 > > > Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM) > {noformat} > create table if not exists test.resource_bench ( > dsr_id uuid, > rel_seq bigint, > seq bigint, > dsp_code varchar, > model_code varchar, > media_code varchar, > transfer_code varchar, > commercial_offer_code varchar, > territory_code varchar, > period_end_month_int int, > authorized_societies_txt text, > rel_type text, > status text, > dsp_release_code text, > title text, > contributors_name list, > unic_work text, > paying_net_qty bigint, > PRIMARY KEY ((dsr_id, rel_seq), seq) > ) WITH CLUSTERING ORDER BY (seq ASC); > CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench > (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH > OPTIONS = {'mode': 'PREFIX'}; > {noformat} > So the index is a {{DENSE}} numerical index. > When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM > test.resource_bench WHERE period_end_month_int = 201401}} using server-side > paging. > I bumped into this stack trace: > {noformat} > WARN [SharedPool-Worker-1] 2016-04-06 00:00:30,825 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: -55 > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] >
[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231597#comment-15231597 ] Jordan West commented on CASSANDRA-11525: - ||branch||testall||dtest|| |[CASSANDRA-11525|https://github.com/xedin/cassandra/tree/CASSANDRA-11525]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-dtest/]| [~doanduyhai] can you try this branch and see if it addresses the issue? Also, can you please upload all of the SSTable components (including SSTable index files) so we can test here as well? The issue was caused by an invalid assumption when clustering columns are used: when stitching together multiple index parts it was possible that the same term, for the same token was in multiple parts, resulting in the union iterator returning the incorrect count. The new approach counts the number of tokens while performing the first iteration. The complexity of the algorithm has not changed and it should be similar in performance. > StaticTokenTreeBuilder should respect posibility of duplicate tokens > > > Key: CASSANDRA-11525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11525 > Project: Cassandra > Issue Type: Bug > Components: sasi > Environment: Cassandra 3.5-SNAPSHOT >Reporter: DOAN DuyHai >Assignee: Jordan West > Fix For: 3.5 > > > Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM) > {noformat} > create table if not exists test.resource_bench ( > dsr_id uuid, > rel_seq bigint, > seq bigint, > dsp_code varchar, > model_code varchar, > media_code varchar, > transfer_code varchar, > commercial_offer_code varchar, > territory_code varchar, > period_end_month_int int, > authorized_societies_txt text, > rel_type text, > status text, > dsp_release_code text, > title text, > contributors_name list, > unic_work text, > paying_net_qty bigint, > PRIMARY KEY ((dsr_id, rel_seq), seq) > ) WITH CLUSTERING ORDER BY (seq ASC); > CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench > (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH > OPTIONS = {'mode': 'PREFIX'}; > {noformat} > So the index is a {{DENSE}} numerical index. > When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM > test.resource_bench WHERE period_end_month_int = 201401}} using server-side > paging. > I bumped into this stack trace: > {noformat} > WARN [SharedPool-Worker-1] 2016-04-06 00:00:30,825 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: -55 > at > org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) > ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] > at >
[jira] [Updated] (CASSANDRA-11397) LIKE query on clustering column index returns incorrect results
[ https://issues.apache.org/jira/browse/CASSANDRA-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-11397: Reviewer: Jordan West (was: Pavel Yaskevich) > LIKE query on clustering column index returns incorrect results > --- > > Key: CASSANDRA-11397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11397 > Project: Cassandra > Issue Type: Bug >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe > Labels: sasi > Fix For: 3.5 > > > The way that {{ClusteringIndexFilter}} and {{RowFilter}} are constructed when > a {{LIKE}} restriction on a clustering column is present is incorrect. For > example: > {code} > cqlsh> create table ks.t1 (k text, c1 text, c2 text, c3 text, v text, primary > key (k,c1,c2,c3)); > cqlsh> create custom index on ks.t1(c2) using > 'org.apache.cassandra.index.sasi.SASIIndex'; > cqlsh> select * from ks.t1; > k | c1 | c2 | c3 | v > ---++++- > a | ba | ca | da | val > a | bb | cb | db | val > a | bc | cc | dc | val > (3 rows) > > cqlsh> select * from ks.t1 where c1 = 'ba' and c3 = 'da' and c2 LIKE 'c%' > ALLOW FILTERING; > k | c1 | c2 | c3 | v > ---++++--- > (0 rows) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11397) LIKE query on clustering column index returns incorrect results
[ https://issues.apache.org/jira/browse/CASSANDRA-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208987#comment-15208987 ] Jordan West commented on CASSANDRA-11397: - [~beobal] the patch looks good and i've confirmed your fix locally. One small piece of feedback: since you fixed things to return a {{ClusteringIndexSliceFilter}} I don't think the {{isLike}} check in {{PrimaryKeyRestrictionSet#appendTo}} is necessary anymore (i've removed it locally without issue). > LIKE query on clustering column index returns incorrect results > --- > > Key: CASSANDRA-11397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11397 > Project: Cassandra > Issue Type: Bug >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe > Labels: sasi > Fix For: 3.5 > > > The way that {{ClusteringIndexFilter}} and {{RowFilter}} are constructed when > a {{LIKE}} restriction on a clustering column is present is incorrect. For > example: > {code} > cqlsh> create table ks.t1 (k text, c1 text, c2 text, c3 text, v text, primary > key (k,c1,c2,c3)); > cqlsh> create custom index on ks.t1(c2) using > 'org.apache.cassandra.index.sasi.SASIIndex'; > cqlsh> select * from ks.t1; > k | c1 | c2 | c3 | v > ---++++- > a | ba | ca | da | val > a | bb | cb | db | val > a | bc | cc | dc | val > (3 rows) > > cqlsh> select * from ks.t1 where c1 = 'ba' and c3 = 'da' and c2 LIKE 'c%' > ALLOW FILTERING; > k | c1 | c2 | c3 | v > ---++++--- > (0 rows) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files
[ https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216917#comment-15216917 ] Jordan West commented on CASSANDRA-11383: - The docs look pretty comprehensive. Thanks! I'll make a more detailed pass through them when I get a chance. I think the only thing we would like to clarify, based on the discussion in this ticket, is when to choose {{SPARSE}} over {{PREFIX}} for numerical data. My last comment (https://issues.apache.org/jira/browse/CASSANDRA-11383?focusedCommentId=15216337=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15216337) mentions a way to do that. Otherwise, specific to {{SPARSE}} the only recommendation I have is that the {{SPARSE}} example on the "CREATE CUSTOM INDEX (SASI)" page (https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html) uses {{age}}, which typically would not be a good candidate for a {{SPARSE}} index (the answer to question number 2 in my linked comment would be: no, there are not millions of ages with each term having a small number of matching keys). > Avoid index segment stitching in RAM which lead to OOM on big SSTable files > > > Key: CASSANDRA-11383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11383 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: C* 3.4 >Reporter: DOAN DuyHai >Assignee: Jordan West > Labels: sasi > Fix For: 3.5 > > Attachments: CASSANDRA-11383.patch, > SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, > new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom > > > 13 bare metal machines > - 6 cores CPU (12 HT) > - 64Gb RAM > - 4 SSD in RAID0 > JVM settings: > - G1 GC > - Xms32G, Xmx32G > Data set: > - ≈ 100Gb/per node > - 1.3 Tb cluster-wide > - ≈ 20Gb for all SASI indices > C* settings: > - concurrent_compactors: 1 > - compaction_throughput_mb_per_sec: 256 > - memtable_heap_space_in_mb: 2048 > - memtable_offheap_space_in_mb: 2048 > I created 9 SASI indices > - 8 indices with text field, NonTokenizingAnalyser, PREFIX mode, > case-insensitive > - 1 index with numeric field, SPARSE mode > After a while, the nodes just gone OOM. > I attach log files. You can see a lot of GC happening while index segments > are flush to disk. At some point the node OOM ... > /cc [~xedin] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions
[ https://issues.apache.org/jira/browse/CASSANDRA-13869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-13869: Attachment: 0001-Fix-AbstractTokenTreeBuilder-serializedSize-when-the.patch attb-serialized-size-bug-test.patch Attached two patches. attb-serialized-size-bug-test.patch is patch that can be applied to trunk illustrate the issue with a failing test. The other is the fix against trunk and some improved testing. > AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a > single leaf and overflow collisions > --- > > Key: CASSANDRA-13869 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13869 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Fix For: 3.11.x > > Attachments: > 0001-Fix-AbstractTokenTreeBuilder-serializedSize-when-the.patch, > attb-serialized-size-bug-test.patch > > > In the extremely rare case where a small token tree (< 248 values) has > overflow collisions the size returned by > AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to > account for the overflow collisions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions
[ https://issues.apache.org/jira/browse/CASSANDRA-13869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-13869: Description: In the extremely rare case where a small token tree (< 248 values) has overflow collisions the size returned by AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to account for the overflow collisions. (was: In the extremely rare case where a small token tree (< 248 values) has overflow collisions* the size returned by AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to account for the overflow collisions. ) > AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a > single leaf and overflow collisions > --- > > Key: CASSANDRA-13869 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13869 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jordan West >Assignee: Jordan West >Priority: Minor > Fix For: 3.11.x > > > In the extremely rare case where a small token tree (< 248 values) has > overflow collisions the size returned by > AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to > account for the overflow collisions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions
Jordan West created CASSANDRA-13869: --- Summary: AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions Key: CASSANDRA-13869 URL: https://issues.apache.org/jira/browse/CASSANDRA-13869 Project: Cassandra Issue Type: Bug Components: sasi Reporter: Jordan West Assignee: Jordan West Priority: Minor Fix For: 3.11.x In the extremely rare case where a small token tree (< 248 values) has overflow collisions* the size returned by AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to account for the overflow collisions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369601#comment-16369601 ] Jordan West edited comment on CASSANDRA-14055 at 5/14/18 7:42 PM: -- [~lboutros]/[~jasobrown], some updates: I have attached two new patches. One for trunk and one of 3.11. Unfortunately, the test changes in trunk don't work well on 3.11 so we can't have one patch. The primary changes in this patch are to change the order we iterate over the indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} collection since its visible outside the {{View}} constructor. ||3.11||Trunk|| |[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]| |[utests|https://circleci.com/gh/jrwest/cassandra/tree/14055-jrwest-3.11]|[utests|https://circleci.com/gh/jrwest/cassandra/tree/14055-jrwest-trunk]| NOTE: same utests are failing on [trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on getting dtests running with my CircleCI setup. Also, I spoke with some colleagues including [~beobal] and [~krummas] about the use of {{sstableRef.globalCount()}} to determine when to delete the SASI index file. I've come to the conclusion that its use at all is wrong because it represents the number of references to the instance, not globally. Given index summary redistribution, this isn't a safe assumption. Looking back at the original SASI patches, I am not sure why it got merged this way. The [patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120] used {{sstable.isMarkedCompacted()}} but the [merged code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124] used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I propose we split that work into a separate ticket and focus this one on SASI's failure to account for index redistribution in the {{View}}. The work covered by the other ticket would entail either a) deleting the SASI index files as part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and implementing a tidier specific to it. was (Author: jrwest): [~lboutros]/[~jasobrown], some updates: I have attached two new patches. One for trunk and one of 3.11. Unfortunately, the test changes in trunk don't work well on 3.11 so we can't have one patch. The primary changes in this patch are to change the order we iterate over the indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} collection since its visible outside the {{View}} constructor. ||3.11||Trunk|| |[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]| |[utests|https://circleci.com/gh/jrwest/cassandra/24]|[utests|https://circleci.com/gh/jrwest/cassandra/26]| NOTE: same utests are failing on [trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on getting dtests running with my CircleCI setup. Also, I spoke with some colleagues including [~beobal] and [~krummas] about the use of {{sstableRef.globalCount()}} to determine when to delete the SASI index file. I've come to the conclusion that its use at all is wrong because it represents the number of references to the instance, not globally. Given index summary redistribution, this isn't a safe assumption. Looking back at the original SASI patches, I am not sure why it got merged this way. The [patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120] used {{sstable.isMarkedCompacted()}} but the [merged code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124] used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I propose we split that work into a separate ticket and focus this one on SASI's failure to account for index redistribution in the {{View}}. The work covered by the other ticket would entail either a) deleting the SASI index files as part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and implementing a tidier specific to it. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Jordan West >
[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475273#comment-16475273 ] Jordan West commented on CASSANDRA-14417: - All minor comments: * Deprecated {{CFS#loadNewSSTables()}} no longer needs to be synchronized. It just constructs an {{ImportOptions}} instance and passes it to the synchronized {{loadNewSSTables(ImportOptions).}} * Add a reference (e.g. @see) to {{CFSMBean.importNewSSTables}} from {{SSMBean.loadNewSSTables}} * In Verifier, is it more appropriate to favor {{OutputHandler#output}} over {{OutputHandler#debug}} for the error message when a key is out of range? * Would like to see some tests (including base/empty case, edge cases like wrap around) for {{RangeOwnHelper}} * {{nodetool refresh}}: Is the removal of the deprecation output intentional? > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477263#comment-16477263 ] Jordan West commented on CASSANDRA-14417: - Looks like the most recent changes moved the second {{if (shouldCountKeys)}} into the upper while loop, which I don't think was intended. branch: [https://github.com/krummas/cassandra/blob/f207720a45c9106cfbdd4e8ab8f34283c58cba52/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L741-L756] vs. Trunk: [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L741-L757] > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477317#comment-16477317 ] Jordan West commented on CASSANDRA-14417: - +1 > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Attachment: 14449.patch > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
Jordan West created CASSANDRA-14449: --- Summary: support cluster backends other than ccm when running dtests Key: CASSANDRA-14449 URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 Project: Cassandra Issue Type: Improvement Components: Testing Reporter: Jordan West Assignee: Jordan West While ccm is a great orchestration tool to run Cassandra clusters locally, it may be desirable to run dtests against clusters running remotely, which may be orchestrated by some tool other than ccm. Dtest is heavily tied to CCM but with a few minor changes its possible to support plugging in other backends that maintain a similar (duck-typed) interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478227#comment-16478227 ] Jordan West commented on CASSANDRA-14449: - Patch attached and is also available at [https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a.] Cassandra branched wired up to use the above changes: [https://github.com/jrwest/cassandra/tree/pluggable-dtest] Test runs with the above changes: https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Flags: Patch > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Attachment: (was: 14449.patch) > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Attachment: 14449.patch Status: Patch Available (was: Open) > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Flags: (was: Patch) > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478227#comment-16478227 ] Jordan West edited comment on CASSANDRA-14449 at 5/16/18 11:18 PM: --- Patch attached and is also available at [https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a|https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a] Cassandra branched wired up to use the above changes: [https://github.com/jrwest/cassandra/tree/pluggable-dtest] Test runs with the above changes: [https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest] was (Author: jrwest): Patch attached and is also available at [https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a.] Cassandra branched wired up to use the above changes: [https://github.com/jrwest/cassandra/tree/pluggable-dtest] Test runs with the above changes: https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14443: Reviewer: Jordan West > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988 ] Jordan West edited comment on CASSANDRA-14443 at 5/23/18 8:37 PM: -- [~KurtG] these changes look like a good improvement, especially not running the regular test suite twice. A few comments below: * How about exiting instead of logging a warning when insufficient resources are available for running resource intensive tests (conftest.py#L460), unless {{--force-resource-intensive-tests}} is set? Behaviour would be unchanged otherwise and users who want to force run only the resource intensive tests on insufficient hardware can use {{-force-resource-intensive-tests --resource-intensive-tests-only}}. * run_dtests.py: some things I noticed that may be worth cleaning up while we’re here 1. original_raw_cmd_args is only used in one place now. Consider removing it. 2. comment on line 120 is stale was (Author: jrwest): [~KurtG] these changes look like a good improvement, especially not running the regular test suite twice. A few comments below: * How about exiting instead of logging a warning when insufficient resources are available for running resource intensive tests (conftest.py#L460), unless {{--force-resource-intensive-tests-}} is set? Behaviour would be unchanged otherwise and users who want to force run only the resource intensive tests on insufficient hardware can use {{-force-resource-intensive-tests --resource-intensive-tests-only}}. * run_dtests.py: some things I noticed that may be worth cleaning up while we’re here 1. original_raw_cmd_args is only used in one place now. Consider removing it. 2. comment on line 120 is stale > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988 ] Jordan West edited comment on CASSANDRA-14443 at 5/23/18 8:36 PM: -- [~KurtG] these changes look like a good improvement, especially not running the regular test suite twice. A few comments below: * How about exiting instead of logging a warning when insufficient resources are available for running resource intensive tests (conftest.py#L460), unless {{--force-resource-intensive-tests-}} is set? Behaviour would be unchanged otherwise and users who want to force run only the resource intensive tests on insufficient hardware can use {{-force-resource-intensive-tests --resource-intensive-tests-only}}. * run_dtests.py: some things I noticed that may be worth cleaning up while we’re here 1. original_raw_cmd_args is only used in one place now. Consider removing it. 2. comment on line 120 is stale was (Author: jrwest): [~KurtG] these changes look like a good improvement, especially not running the test regular suite twice. A few comments below: * How about exiting instead of logging a warning when insufficient resources are available for running resource intensive tests (conftest.py#L460), unless {{--force-resource-intensive-tests}} is set? Behaviour would be unchanged otherwise and users who want to force run only the resource intensive tests on insufficient hardware can use {{--force-resource-intensive-tests --resource-intensive-tests-only}}. * run_dtests.py: some things I noticed that may be worth cleaning up while we’re here 1. original_raw_cmd_args is only used in one place now. Consider removing it. 2. comment on line 120 is stale > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988 ] Jordan West commented on CASSANDRA-14443: - [~KurtG] these changes look like a good improvement, especially not running the test regular suite twice. A few comments below: * How about exiting instead of logging a warning when insufficient resources are available for running resource intensive tests (conftest.py#L460), unless {{--force-resource-intensive-tests}} is set? Behaviour would be unchanged otherwise and users who want to force run only the resource intensive tests on insufficient hardware can use {{--force-resource-intensive-tests --resource-intensive-tests-only}}. * run_dtests.py: some things I noticed that may be worth cleaning up while we’re here 1. original_raw_cmd_args is only used in one place now. Consider removing it. 2. comment on line 120 is stale > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14449: Reviewer: Ariel Weisberg > support cluster backends other than ccm when running dtests > --- > > Key: CASSANDRA-14449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14449 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > Attachments: 14449.patch > > > While ccm is a great orchestration tool to run Cassandra clusters locally, it > may be desirable to run dtests against clusters running remotely, which may > be orchestrated by some tool other than ccm. > Dtest is heavily tied to CCM but with a few minor changes its possible to > support plugging in other backends that maintain a similar (duck-typed) > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600 ] Jordan West edited comment on CASSANDRA-14417 at 5/16/18 3:59 AM: -- * The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug (because the message was removed and the exception is logged as debug). I see it changed when extended=true. Verifier#L217. * While cleaning things up, should {{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use {{KeyIterator}} as well? * EDIT: Also, just noticed the failing dtest. Seems to be in an area related-ish to these changes but I am not familiar enough with it yet to know if its related or just a flaky test. (EDIT: it failed here on a branch without this change https://circleci.com/gh/jrwest/cassandra/86#tests/containers/66) was (Author: jrwest): * The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug (because the message was removed and the exception is logged as debug). I see it changed when extended=true. Verifier#L217. * While cleaning things up, should {{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use {{KeyIterator}} as well? * EDIT: Also, just noticed the failing dtest. Seems to be in an area related-ish to these changes but I am not familiar enough with it yet to know if its related or just a flaky test. > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600 ] Jordan West commented on CASSANDRA-14417: - * The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug (because the message was removed and the exception is logged as debug). I see it changed when extended=true. Verifier#L217. * While cleaning things up, should {{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use {{KeyIterator}} as well? > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14417) nodetool import cleanup/fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600 ] Jordan West edited comment on CASSANDRA-14417 at 5/15/18 11:19 PM: --- * The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug (because the message was removed and the exception is logged as debug). I see it changed when extended=true. Verifier#L217. * While cleaning things up, should {{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use {{KeyIterator}} as well? * EDIT: Also, just noticed the failing dtest. Seems to be in an area related-ish to these changes but I am not familiar enough with it yet to know if its related or just a flaky test. was (Author: jrwest): * The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug (because the message was removed and the exception is logged as debug). I see it changed when extended=true. Verifier#L217. * While cleaning things up, should {{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use {{KeyIterator}} as well? > nodetool import cleanup/fixes > - > > Key: CASSANDRA-14417 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14417 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > * We shouldn't expose importNewSSTables in both StorageServiceMBean and > CFSMbean > * Allow a quicker token check without doing an extended verify > * Introduce an ImportOptions class to avoid passing in 100 booleans in > importNewSSTables -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508955#comment-16508955 ] Jordan West commented on CASSANDRA-14499: - Nothing would disallow truncate, although if one node is at quota, is dropping all data what is desired? In some use-cases perhaps. Since deletes temporarily inflate storage use, for a node level quota I don't think they should be allowed (for a keyspace-level quota that would be different perhaps). The client also can't be expected to know exactly which keys live on the node(s) that are at quota which makes remediation by delete less viable. The most likely remediations are adding more nodes or truncation. A correct implementation would prevent neither of these. I agree that this could/should live in the management process > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506754#comment-16506754 ] Jordan West commented on CASSANDRA-14499: - {quote}disabling gosspi alone is insufficient, also need to disable native {quote} Agreed. I hadn't updated the description to reflect it but what I am working on does this as well. {quote}still not sure I buy the argument that it’s wrong to serve reads in this case - it may be true that some table is getting out of sync, but that doesn’t mean every table is, {quote} I agree it depends on the workload for each specific dataset but since we can't know which we have we have to assume it could get really out of sync. {quote}and we already have a mechanism to deal with nodes that can serve reads but not writes (speculating on the read repair). {quote} Even if we speculate we still attempt it. That work will always be for naught and being at quota is likely a prolonged state (the ways out of it take a while). {quote}If you don’t serve reads either, than any GC pause will be guaranteed to impact client request latency as we can’t soeculate around it in the common rf=3 case. {quote} This is true. But thats almost the same as losing a node because its disk has been filled up completely. If we have one unhealthy node we are another unhealthy node away from unavailability in the rf=3/quorum case. That said, I'll consider the reads more over the weekend. Its a valid concern. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506746#comment-16506746 ] Jordan West commented on CASSANDRA-14499: - The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506746#comment-16506746 ] Jordan West edited comment on CASSANDRA-14499 at 6/9/18 12:36 AM: -- The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. EDIT: also to clarify, the goal here isn't to implement a perfect quota. There will be some room for error where the quota can be exceeded. The goal is to the mark the node unhealthy when it reaches this level and to have enough headroom for compaction or other operations to get it to a healthy state. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. was (Author: jrwest): The other reason the OS level wouldn't work is we are trying to track *live* data, which the OS can't tell the difference between. Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some offline. Since the node can only get more and more out of sync while not taking write traffic and can't participate in (read) repair until the amount of storage used is below quota, we thought it better to disable both reads and writes. Less-blocking and speculative read repair makes us more available in this case (as it should). Disabling gossip is a quick route to disabling reads/writes. Is it the best approach to doing so? I'm not 100%. My concern is for how the operator gets back to a healthy state once a quota is reached on a node. They have a few options: migrate data to a bigger node, compaction catches up and deletes data, quota is raised so its not met anymore, node(s) are added to take storage responsibility away from the node, or data is forcefully deleted from the node. We need to ensure we don't prevent those operations from taking place. I've been discussing this with [~jasobrown] offline as well. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506533#comment-16506533 ] Jordan West commented on CASSANDRA-14499: - [~jeromatron] I understand those concerns. This would be opt-in for folks who wanted automatic action taken and any such action should take care to not cause the node to flap, for example. One use case where we see this as valuable is QA/perf/test clusters that may not have the full monitoring setup but need to be protected from errant clients filling up disks to a point where worse things happen. The warning system can be accomplished today with monitoring and alerting on the same metrics. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker
Jordan West created CASSANDRA-14529: --- Summary: nodetool import row cache invalidation races with adding sstables to tracker Key: CASSANDRA-14529 URL: https://issues.apache.org/jira/browse/CASSANDRA-14529 Project: Cassandra Issue Type: Bug Reporter: Jordan West Assignee: Jordan West CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, which [occurs before adding new sstables to the tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178]. Stale reads will result after a read is interleaved with the read row's invalidation and adding the containing file to the tracker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker
[ https://issues.apache.org/jira/browse/CASSANDRA-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14529: Status: Patch Available (was: Open) Made the cache invalidation run after the files are added to the tracker. This is similar to [streaming|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraStreamReceiver.java#L207-L210]. There is still a race condition but the worst case is only invalidation of a cached copy of the newly added data. Branch: [https://github.com/jrwest/cassandra/commits/14529-trunk] Tests: [https://circleci.com/gh/jrwest/cassandra/tree/14529-trunk] > nodetool import row cache invalidation races with adding sstables to tracker > > > Key: CASSANDRA-14529 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14529 > Project: Cassandra > Issue Type: Bug >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, > which [occurs before adding new sstables to the > tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178]. > Stale reads will result after a read is interleaved with the read row's > invalidation and adding the containing file to the tracker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14499) node-level disk quota
[ https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508484#comment-16508484 ] Jordan West commented on CASSANDRA-14499: - Since the goal isn't to strictly enforce the quota (its ok if its violated but once noticed action should be taken) the code isn't invasive. Its a small amount of new code with the only change being to schedule the check on optional tasks. That being said, if the concern is complexity, one potential place for this (and I think it may be better home regardless) is [CASSANDRA-14395|https://issues.apache.org/jira/browse/CASSANDRA-14395]. While this may seem like a small bandaid, and there are cases where multiple nodes can go down at once, it is exactly meant to give some headroom. This headroom makes it considerably easier to get the cluster into a healthy state again. > node-level disk quota > - > > Key: CASSANDRA-14499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 > Project: Cassandra > Issue Type: New Feature >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > > Operators should be able to specify, via YAML, the amount of usable disk > space on a node as a percentage of the total available or as an absolute > value. If both are specified, the absolute value should take precedence. This > allows operators to reserve space available to the database for background > tasks -- primarily compaction. When a node reaches its quota, gossip should > be disabled to prevent it taking further writes (which would increase the > amount of data stored), being involved in reads (which are likely to be more > inconsistent over time), or participating in repair (which may increase the > amount of space used on the machine). The node re-enables gossip when the > amount of data it stores is below the quota. > The proposed option differs from {{min_free_space_per_drive_in_mb}}, which > reserves some amount of space on each drive that is not usable by the > database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex
[ https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14207: Attachment: 14207-example-test.patch Reproduced In: 3.11.1, 3.11.0 (was: 3.11.0, 3.11.1) Status: Patch Available (was: Open) I've worked up a patch for this that applies to 3.11: [https://github.com/jrwest/cassandra/commits/14207-3.11.] The patch applies cleanly to trunk last I tested. Ran tests on [3.11|https://circleci.com/gh/jrwest/cassandra/tree/14207-3%2E11] and on [trunk|https://circleci.com/gh/jrwest/cassandra/tree/14207-trunk]. Also, attached is a test that I don't think is worth merging (its too contrived) but is illustrative of the scenario that causes a double release to occur. [~ifesdjeen] would you be able to take a look? > Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper > Reference Counting of SSTableIndex > > > Key: CASSANDRA-14207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14207 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > Attachments: 14207-example-test.patch, > sasi-invalid-reference-count.rtf > > > A race between e.g. Index Redistribution and Compaction can cause the compare > and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to > fail, leading to recreation of the view and improper reference counting of an > {{SSTableIndex}}. This is because the side-effects (decrementing the > reference count via {{SStableIndex#release}}) occur regardless of if the view > is promoted to be the active view. > Code: > https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78 > > Attached logs and debug output show case where index redistribution and > compaction race. This case was generated using the test provided in > https://issues.apache.org/jira/browse/CASSANDRA-14055 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex
[ https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14207: Description: A race between e.g. index redistribution and compaction (or memtable flushes and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view and improper reference counting of an {{SSTableIndex}}. This is because the side-effects (decrementing the reference count via {{SStableIndex#release}}) occur regardless of if the view is promoted to be the active view. Code: [https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78] Attached logs and debug output show case where index redistribution and compaction race. This case was generated using the test provided in https://issues.apache.org/jira/browse/CASSANDRA-14055 was: A race between e.g. Index Redistribution and Compaction can cause the compare and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view and improper reference counting of an {{SSTableIndex}}. This is because the side-effects (decrementing the reference count via {{SStableIndex#release}}) occur regardless of if the view is promoted to be the active view. Code: https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78 Attached logs and debug output show case where index redistribution and compaction race. This case was generated using the test provided in https://issues.apache.org/jira/browse/CASSANDRA-14055 > Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper > Reference Counting of SSTableIndex > > > Key: CASSANDRA-14207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14207 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > Attachments: 14207-example-test.patch, > sasi-invalid-reference-count.rtf > > > A race between e.g. index redistribution and compaction (or memtable flushes > and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in > {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view > and improper reference counting of an {{SSTableIndex}}. This is because the > side-effects (decrementing the reference count via {{SStableIndex#release}}) > occur regardless of if the view is promoted to be the active view. > Code: > [https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78] > > Attached logs and debug output show case where index redistribution and > compaction race. This case was generated using the test provided in > https://issues.apache.org/jira/browse/CASSANDRA-14055 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex
[ https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494717#comment-16494717 ] Jordan West commented on CASSANDRA-14207: - [~jjirsa] generally I agree but the changes move the code that has potential to regress out of the code covered by the test and the test is a potential interleaving based on how another part of code is currently written. I think it serves better as an illustration only, in this specific case. > Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper > Reference Counting of SSTableIndex > > > Key: CASSANDRA-14207 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14207 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Jordan West >Assignee: Jordan West >Priority: Major > Attachments: 14207-example-test.patch, > sasi-invalid-reference-count.rtf > > > A race between e.g. index redistribution and compaction (or memtable flushes > and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in > {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view > and improper reference counting of an {{SSTableIndex}}. This is because the > side-effects (decrementing the reference count via {{SStableIndex#release}}) > occur regardless of if the view is promoted to be the active view. > Code: > [https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78] > > Attached logs and debug output show case where index redistribution and > compaction race. This case was generated using the test provided in > https://issues.apache.org/jira/browse/CASSANDRA-14055 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14479) Secondary Indexes Can "Leak" Records If Insert/Partition Delete Occur Between Flushes
Jordan West created CASSANDRA-14479: --- Summary: Secondary Indexes Can "Leak" Records If Insert/Partition Delete Occur Between Flushes Key: CASSANDRA-14479 URL: https://issues.apache.org/jira/browse/CASSANDRA-14479 Project: Cassandra Issue Type: Bug Components: Secondary Indexes Reporter: Jordan West Attachments: 2i-leak-test.patch When an insert of an indexed column is followed rapidly (within the same memtable) by a delete of an entire partition, the index table for the column will continue to store the record for the inserted value and no tombstone will ever be written. This occurs because the index isn't updated after the delete but before the flush. The value is lost after flush, so subsequent compactions can't issue a delete for the primary key in the index column. The attached test reproduces the described issue. The test fails to assert that the index cfs is empty. The subsequent assertion that there are no live sstables would also fail. Looking on disk with sstabledump after running this test shows the value remaining. Originally reported on the mailing list by Roman Bielik: Create table with LeveledCompactionStrategy; 'tombstone_compaction_interval': 60; gc_grace_seconds=60 There are two indexed columns for comparison: column1, column2 Insert keys \{1..x} with random values in column1 & column2 Delete \{key:column2} (but not column1) Delete \{key} Repeat n-times from the inserts Wait 1 minute nodetool flush nodetool compact (sometimes compact nodetool cfstats What I observe is, that the data table is empty, column2 index table is also empty and column1 index table has non-zero (leaked) "space used" and "estimated rows". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
[ https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497633#comment-16497633 ] Jordan West commented on CASSANDRA-14468: - Can take a look next week > "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16 > - > > Key: CASSANDRA-14468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14468 > Project: Cassandra > Issue Type: Bug >Reporter: Wade Simmons >Priority: Major > > I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this > error: > {code} > org.apache.cassandra.exceptions.ConfigurationException: Unable to parse > targets for index idx_foo ("666f6f") > at > org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > [apache-cassandra-3.0.16.jar:3.0.16] > {code} > It looks like this might be related to CASSANDRA-14104 that was just added to > 3.0.16 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14442) Let nodetool import take a list of directories
[ https://issues.apache.org/jira/browse/CASSANDRA-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497552#comment-16497552 ] Jordan West commented on CASSANDRA-14442: - [~krummas] LGTM overall. One thing I wanted to check: is it ok that callers of importNewSSTables can now run concurrently with callers of CFS#runWithCompactionsDisabled (callers like truncate and clearUnsafe)? Some minor things: * Remove the whitespace only change in Tracker * Rename first argument of CFS#importNewSSTables to srcPaths * Consider moving {{SSTableImporter#moveAndOpenSSTable}} to be a static method on SSTable, maybe {{renameAndOpen}} (it may be useful for future uses/tests and isn’t specific to {{SSTableImporter}}) * Thanks for adding the new dtests. Should they be marked since 4.0? > Let nodetool import take a list of directories > -- > > Key: CASSANDRA-14442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14442 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > It should be possible to load sstables from several input directories when > running nodetool import. Directories that failed to import should be output. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14451) Infinity ms Commit Log Sync
[ https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14451: Reviewer: Jordan West > Infinity ms Commit Log Sync > --- > > Key: CASSANDRA-14451 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14451 > Project: Cassandra > Issue Type: Bug > Environment: 3.11.2 - 2 DC >Reporter: Harry Hough >Assignee: Jason Brown >Priority: Minor > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Its giving commit log sync warnings where there were apparently zero syncs > and therefore gives "Infinityms" as the average duration > {code:java} > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 74.40ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 198.69ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 264.11ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 > NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, > average duration of 17.56ms, 1 have exceeded the configured commit interval > by an average of 173.66ms{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14451) Infinity ms Commit Log Sync
[ https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501213#comment-16501213 ] Jordan West commented on CASSANDRA-14451: - [~jasobrown] change LGTM. A few questions and minor comments: * Are the ArchiveCommitLog dtest failures expected on the 3.0 branch? * The “sleep any time we have left” comment would be more appropriate above the assignment of {{wakeUpAt}}. * Mark {{maybeLogFlushLag}} and {{getTotalSyncDuration}} as {{@VisibleForTesting}} * Just wanted to check that the change in behavior of updating {{totalSyncDuration}} is intentional. It makes sense to me that we only increment it if a sync actually occurs but that wasn’t the case before. * Is there are reason you opted for the “excessTimeToFlush” approach in 3.0 but the “maxFlushTimestamp” approach on 3.11 and trunk? The only difference I see is the unit of time. > Infinity ms Commit Log Sync > --- > > Key: CASSANDRA-14451 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14451 > Project: Cassandra > Issue Type: Bug > Environment: 3.11.2 - 2 DC >Reporter: Harry Hough >Assignee: Jason Brown >Priority: Minor > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Its giving commit log sync warnings where there were apparently zero syncs > and therefore gives "Infinityms" as the average duration > {code:java} > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 74.40ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 198.69ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 264.11ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 > NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, > average duration of 17.56ms, 1 have exceeded the configured commit interval > by an average of 173.66ms{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14451) Infinity ms Commit Log Sync
[ https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501923#comment-16501923 ] Jordan West commented on CASSANDRA-14451: - {quote} I left it where it was previously located, but can move it to the more logical spot. {quote} I don't find it very useful where it is now. Would vote to move it or remove it (the code is pretty clear). {quote}I wanted to keep the logic as close to the original as possible, since 3.0 is far along in it's age. I suppose it doesn't matter that much, though, and can change if you think it's worthwhile. wdyt? {quote} >From the review perspective it was just a second implementation to check for >correctness and it seems like either implementation could be used. Would vote >for them to be the same but fine as is if you prefer. Otherwise, +1 > Infinity ms Commit Log Sync > --- > > Key: CASSANDRA-14451 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14451 > Project: Cassandra > Issue Type: Bug > Environment: 3.11.2 - 2 DC >Reporter: Harry Hough >Assignee: Jason Brown >Priority: Minor > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Its giving commit log sync warnings where there were apparently zero syncs > and therefore gives "Infinityms" as the average duration > {code:java} > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 74.40ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 198.69ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 264.11ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 > NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, > average duration of 17.56ms, 1 have exceeded the configured commit interval > by an average of 173.66ms{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14499) node-level disk quota
Jordan West created CASSANDRA-14499: --- Summary: node-level disk quota Key: CASSANDRA-14499 URL: https://issues.apache.org/jira/browse/CASSANDRA-14499 Project: Cassandra Issue Type: New Feature Reporter: Jordan West Assignee: Jordan West Operators should be able to specify, via YAML, the amount of usable disk space on a node as a percentage of the total available or as an absolute value. If both are specified, the absolute value should take precedence. This allows operators to reserve space available to the database for background tasks -- primarily compaction. When a node reaches its quota, gossip should be disabled to prevent it taking further writes (which would increase the amount of data stored), being involved in reads (which are likely to be more inconsistent over time), or participating in repair (which may increase the amount of space used on the machine). The node re-enables gossip when the amount of data it stores is below the quota. The proposed option differs from {{min_free_space_per_drive_in_mb}}, which reserves some amount of space on each drive that is not usable by the database. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14442) Let nodetool import take a list of directories
[ https://issues.apache.org/jira/browse/CASSANDRA-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498265#comment-16498265 ] Jordan West commented on CASSANDRA-14442: - LGTM. I'm +1 as is but one minor suggestion if you feel like including it: The live SSTable check could be replaced by the following. Its a little more succinct and less work (since we do the "contains" check in the iteration instead of afterwards): {code:java} boolean isLive = cfs.getLiveSSTables().stream().filter(r -> r.descriptor.equals(newDescriptor) || r.descriptor.equals(oldDescriptor)).findAny().isPresent(); if (isLive) { String message = String.format("Can't move and open a file that is already in use in the table %s -> %s", oldDescriptor, newDescriptor); logger.error(message); throw new RuntimeException(message); } {code} > Let nodetool import take a list of directories > -- > > Key: CASSANDRA-14442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14442 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.x > > > It should be possible to load sstables from several input directories when > running nodetool import. Directories that failed to import should be output. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530727#comment-16530727 ] Jordan West edited comment on CASSANDRA-14443 at 7/3/18 2:30 AM: - Thanks for the updates [~KurtG]. Some comments from a second pass: * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, otherwise the logging occurs in the wrong case * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not upgrade_only}}, similar to L445 for resource intensive flags * Is it intentional to use {{print}} instead of the loggers in conftest.py? conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to use loggers instead of {{print}} on L73. {quote}TBH I think that the whole resource check can go, and this kind of information is more suitable for the documentation. We don't put resource limits on Cassandra and I don't think we should do it for the dtests. {quote} Logging is useful because it reduces the amount of time to find the issue and fix it. Failing fast, with an option to override, speeds that up more but I'm not strongly for it, if you prefer to leave it as is. was (Author: jrwest): Thanks for the updates [~KurtG]. S}ome comments from a second pass: * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, otherwise the logging occurs in the wrong case * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not upgrade_only}}, similar to L445 for resource intensive flags * Is it intentional to use {{print}} instead of the loggers in conftest.py? conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to use loggers instead of {{print}} on L73. {quote}TBH I think that the whole resource check can go, and this kind of information is more suitable for the documentation. We don't put resource limits on Cassandra and I don't think we should do it for the dtests. {quote} Logging is useful because it reduces the amount of time to find the issue and fix it. Failing fast, with an option to override, speeds that up more but I'm not strongly for it, if you prefer to leave it as is. > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14443) Improvements for running dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530727#comment-16530727 ] Jordan West commented on CASSANDRA-14443: - Thanks for the updates [~KurtG]. S}ome comments from a second pass: * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, otherwise the logging occurs in the wrong case * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not upgrade_only}}, similar to L445 for resource intensive flags * Is it intentional to use {{print}} instead of the loggers in conftest.py? conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to use loggers instead of {{print}} on L73. {quote}TBH I think that the whole resource check can go, and this kind of information is more suitable for the documentation. We don't put resource limits on Cassandra and I don't think we should do it for the dtests. {quote} Logging is useful because it reduces the amount of time to find the issue and fix it. Failing fast, with an option to override, speeds that up more but I'm not strongly for it, if you prefer to leave it as is. > Improvements for running dtests > --- > > Key: CASSANDRA-14443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14443 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Kurt Greaves >Assignee: Kurt Greaves >Priority: Major > Labels: dtest > > We currently hardcode a requirement that you need at least 27gb of memory to > run the resource intensive tests. This is rather annoying as there isn't > really a strict hardware requirement and tests can run on smaller machines in > a lot of cases (especially if you mess around with HEAP). > We've already got the command line argument > {{--force-resource-intensive-tests}}, we don't need additional restrictions > in place to stop people who shouldn't be running the tests from running them. > We also don't have a way to run _only_ the resource-intensive dtests or > _only_ the upgrade tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Reviewer: Jordan West (was: Alex Petrov) > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.2 > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347781#comment-16347781 ] Jordan West commented on CASSANDRA-14055: - Hi [~lboutros], One of the original authors of SASI here. I've been taking a look at this issue and your patch. Using the provided test against the {{cassandra-3.11}} branch (fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases – two related directly to this issue and one tangentially related. More details on those below. With respect to this issue in particular, the three scenarios cause the test to fail because {{IndexSummaryManager}} ends up creating a new {{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This occurs because the {{IndexSummaryManager}} may "update" (re-open) an {{SSTableReader}} for an index already in the view. I believe this is unique to {{IndexSummaryManager}} and I am able to make your tests pass* without your patch by ensuring that there is no overlap between {{oldSStables}} and {{newIndexes}} (favoring {{newIndexes}}). Your patch looks to do this as well, though the approach is a bit different. One thing I am curious about in your patch is the {{keepFile}} changes to {{SSTableIndex#release}}. Generally, this concerns me because it seems to be working around improper reference counting rather than correcting the reference counting itself. Also, while using the provided test, I am unable to hit a case where the condition {{obsolete.get() || sstableRef.globalCount() == 0}} is true. I see the file missing in the {{View}} but not on disk itself. Could you elaborate a bit more on the need for this change and your use of the {{keepFile}} flag? The three failure scenarios I see using the provided test are: h5. 8 keys returned - sequential case In this scenario, at the time when the query that fails runs, the {{View}} is missing the most recently flushed sstable. As mentioned previously, this is because the intersection of {{oldSSTables}} and {{newIndexes}} is non-empty. This can be fixed* by ensuring nothing in {{newIndexes}} is in {{oldSSTables}}. I call this the sequential case because the compaction that occurs during the test completes before the index summary redistribution begins to create a new {{View}}. This is also addressed by your patch. h5. 8 keys returned - race case This scenario is similar to the previous one but has the additional issue of triggering improper {{SSTableIndex}} reference counting. From the perspective of the provided test, the failure scenario is the same and the fix* is as well. The issue occurs because of a race between compaction and index redistribution's creation of new {{View}} instances. This causes redistribution to create two {{View}} instances, the first of which is thrown away due to a failed compare and swap. The problem is the side-effects (calling {{SSTableIndex#release}}) have occurred already inside the creation of the garbage {{View}}, causing the reference count for the index to drop below 0. I see this issue as a separate one from this ticket and will file a separate JIRA. It is not fixed by the previously mentioned change and while I haven't checked in detail, I don't think the provided patch addresses this either. h5. 0 keys returned This scenario is similar to the first but there are three threads involved in the race: the compaction, the flushing of the last memtable, and the index redistribution. In this case, the end result is an empty {{View}}, which leads to no keys being returned since the system thinks there are no indexes to search. This is fixed* by what I mentioned previously and occurs because index redistribution re-opens both sstables in the original {{View}} instead of just one. It is also addressed by your patch. I am curious if you see any other failure scenarios besides these three and, in particular, if you can elaborate on and provide examples of the issues you see regarding the files being missing on disk and the need for the {{keepFile}} change. \* While this fix makes the provided test pass I am still verifying its correct from the reference counting perspective. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.2 > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the
[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347781#comment-16347781 ] Jordan West edited comment on CASSANDRA-14055 at 1/31/18 11:39 PM: --- Hi [~lboutros], One of the original authors of SASI here. I've been taking a look at this issue and your patch. Using the provided test against the {{cassandra-3.11}} branch (fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases – two related directly to this issue and one tangentially related. More details on those below. With respect to this issue in particular, the three scenarios cause the test to fail because {{IndexSummaryManager}} ends up creating a new {{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This occurs because the {{IndexSummaryManager}} may "update" (re-open) an {{SSTableReader}} for an index already in the view. I believe this is unique to {{IndexSummaryManager}} and I am able to make your tests pass* without your patch by ensuring that there is no overlap between {{oldSStables}} and {{newIndexes}} (favoring {{newIndexes}}). Your patch looks to do this as well, though the approach is a bit different. One thing I am curious about in your patch is the {{keepFile}} changes to {{SSTableIndex#release}}. Generally, this concerns me because it seems to be working around improper reference counting rather than correcting the reference counting itself. Also, while using the provided test, I am unable to hit a case where the condition {{obsolete.get() || sstableRef.globalCount() == 0}} is true. I see the file missing in the {{View}} but not on disk itself. Could you elaborate a bit more on the need for this change and your use of the {{keepFile}} flag? The three failure scenarios I see using the provided test are: h5. 8 keys returned - sequential case In this scenario, at the time when the query that fails runs, the {{View}} is missing the most recently flushed sstable. As mentioned previously, this is because the intersection of {{oldSSTables}} and {{newIndexes}} is non-empty. This can be fixed* by ensuring nothing in {{newIndexes}} is in {{oldSSTables}}. I call this the sequential case because the compaction that occurs during the test completes before the index summary redistribution begins to create a new {{View}}. This is also addressed by your patch. h5. 8 keys returned - race case This scenario is similar to the previous one but has the additional issue of triggering improper {{SSTableIndex}} reference counting. From the perspective of the provided test, the failure scenario is the same and the fix* is as well. The issue occurs because of a race between compaction and index redistribution's creation of new {{View}} instances. This causes redistribution to create two {{View}} instances, the first of which is thrown away due to a failed compare and swap. The problem is the side-effects (calling {{SSTableIndex#release}}) have occurred already inside the creation of the garbage {{View}}, causing the reference count for the index to drop below 0. I see this issue as a separate one from this ticket and have filed [CASSANDRA-14207|https://issues.apache.org/jira/browse/CASSANDRA-14207]. It is not fixed by the previously mentioned change and while I haven't checked in detail, I don't think the provided patch addresses this either. h5. 0 keys returned This scenario is similar to the first but there are three threads involved in the race: the compaction, the flushing of the last memtable, and the index redistribution. In this case, the end result is an empty {{View}}, which leads to no keys being returned since the system thinks there are no indexes to search. This is fixed* by what I mentioned previously and occurs because index redistribution re-opens both sstables in the original {{View}} instead of just one. It is also addressed by your patch. I am curious if you see any other failure scenarios besides these three and, in particular, if you can elaborate on and provide examples of the issues you see regarding the files being missing on disk and the need for the {{keepFile}} change. \* While this fix makes the provided test pass I am still verifying its correct from the reference counting perspective. was (Author: jrwest): Hi [~lboutros], One of the original authors of SASI here. I've been taking a look at this issue and your patch. Using the provided test against the {{cassandra-3.11}} branch (fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases – two related directly to this issue and one tangentially related. More details on those below. With respect to this issue in particular, the three scenarios cause the test to fail because {{IndexSummaryManager}} ends up creating a new {{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This
[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Attachment: CASSANDRA-14055-jrwest.patch > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.2 > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex
Jordan West created CASSANDRA-14207: --- Summary: Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex Key: CASSANDRA-14207 URL: https://issues.apache.org/jira/browse/CASSANDRA-14207 Project: Cassandra Issue Type: Bug Components: sasi Reporter: Jordan West Assignee: Jordan West Attachments: sasi-invalid-reference-count.rtf A race between e.g. Index Redistribution and Compaction can cause the compare and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view and improper reference counting of an {{SSTableIndex}}. This is because the side-effects (decrementing the reference count via {{SStableIndex#release}}) occur regardless of if the view is promoted to be the active view. Code: https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78 Attached logs and debug output show case where index redistribution and compaction race. This case was generated using the test provided in https://issues.apache.org/jira/browse/CASSANDRA-14055 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Attachment: (was: CASSANDRA-14055-jrwest.patch) > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.2 > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358877#comment-16358877 ] Jordan West commented on CASSANDRA-14055: - Hi [~lboutros], My apologies for the delay. I am waiting on internal review of my version of the patch so you can take a look. I believe this patch accomplishes the same thing but with less changes and doesn't affect the reference counting in SSTableIndex. I hope to have it posted for your review and testing early next week. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.2 > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Attachment: CASSANDRA-14055-jrwest.patch > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361610#comment-16361610 ] Jordan West commented on CASSANDRA-14055: - Hi [~lboutros], Attached my patch for your review and testing. If you could verify this does the right thing in your environments that would be especially helpful since I have been unable to replicate the deleted file issue – I only see the sstables removed from the SASI View. The gist of the patch is to ensure the intersection of {{oldSStables}} and {{newIndexes}} is always empty. Your patch was doing the same by not checking {{oldSSTables}} in the second for-loop, but this approach doesn't require the changes to {{SSTableIndex#release}}. I re-used your test but removed the version that runs with the data entirely in-memory since that won't be affected by index redistribution. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x > > Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364440#comment-16364440 ] Jordan West edited comment on CASSANDRA-14055 at 2/14/18 5:22 PM: -- [~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is accessed/stored was changed and the test will need to be modified as a result. Will post a separate patch for trunk. [~lboutros], In my testing, the primary issue I saw was that files were removed from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with sequence numbers 1-4 & 6) and during the test a compaction typically happens (that generates a sstable with generation 5 from sstables 1-4). The final SASI {{View}} when the queries are performed should contain either (1-4, 6) or (5, 6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} ends up containing only sstable 5 or by returning 0 keys instead of 10 when the SASI {{View}} ends up empty. The issue occurs when index redistribution completes. Depending on the interleaving* of events (the memtable flush, compaction, and redistribution), redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This results in an {{SSTableListChangedNotification}}, which in turn results in the creation of a new {{View}}, where {{added=[6]}} (or {{added=[5,6]}}) and {{removed=[6]}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming these two sets were disjoint, which is why any reader in {{oldSSTables}} caused the index to be closed. This is incorrect in both cases because sstables 5 and 6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 & 10). Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 via their SSTableIndex instance but we’ve created a second reference and one needs to be closed. This is ensured by the {{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are still indeed calling {{#release()}} on one instance). As I am writing this, however, I am realizing we want to keep a reference to the newer index, which references the newer SSTable instance and my patch does the opposite — keeping the old instance. I will post an updated patch along with my trunk patch for internal review, but the gist is to change the order we iterate over the old view and new indexes to favor new index instances. NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207 above *I've found a few other interleavings by using another machine, but the general issue is the same. was (Author: jrwest): [~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is accessed/stored was changed and the test will need to be modified as a result. Will post a separate patch for trunk. [~lboutros], In my testing, the primary issue I saw was that files were removed from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with sequence numbers 1-4 & 6) and during the test a compaction typically happens (that generates a sstable with generation 5 from sstables 1-4). The final SASI {{View}} when the queries are performed should contain either (1-4, 6) or (5, 6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} ends up containing only sstable 5 or by returning 0 keys instead of 10 when the SASI {{View}} ends up empty. The issue occurs when index redistribution completes. Depending on the interleaving* of events (the memtable flush, compaction, and redistribution), redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This results in an {{SSTableListChangedNotification}}, which in turn results in the creation of a new {{View}}, where {{added=6}} (or {{added=[5,6]}}) and {{removed=6}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming these two sets were disjoint, which is why any reader in {{oldSSTables}} caused the index to be closed. This is incorrect in both cases because sstables 5 and 6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 & 10). Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 via their SSTableIndex instance but we’ve created a second reference and one needs to be closed. This is ensured by the {{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are still indeed calling {{#release()}} on one instance). As I am writing this, however, I am realizing we want to keep a reference to the newer index, which references the newer SSTable instance and my patch does the opposite — keeping the old instance. I will post an updated patch along with my trunk patch for internal review, but the gist is to change the order we iterate over the old view and new indexes to favor new index instances. NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364440#comment-16364440 ] Jordan West commented on CASSANDRA-14055: - [~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is accessed/stored was changed and the test will need to be modified as a result. Will post a separate patch for trunk. [~lboutros], In my testing, the primary issue I saw was that files were removed from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with sequence numbers 1-4 & 6) and during the test a compaction typically happens (that generates a sstable with generation 5 from sstables 1-4). The final SASI {{View}} when the queries are performed should contain either (1-4, 6) or (5, 6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} ends up containing only sstable 5 or by returning 0 keys instead of 10 when the SASI {{View}} ends up empty. The issue occurs when index redistribution completes. Depending on the interleaving* of events (the memtable flush, compaction, and redistribution), redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This results in an {{SSTableListChangedNotification}}, which in turn results in the creation of a new {{View}}, where {{added=6}} (or {{added=[5,6]}}) and {{removed=6}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming these two sets were disjoint, which is why any reader in {{oldSSTables}} caused the index to be closed. This is incorrect in both cases because sstables 5 and 6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 & 10). Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 via their SSTableIndex instance but we’ve created a second reference and one needs to be closed. This is ensured by the {{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are still indeed calling {{#release()}} on one instance). As I am writing this, however, I am realizing we want to keep a reference to the newer index, which references the newer SSTable instance and my patch does the opposite — keeping the old instance. I will post an updated patch along with my trunk patch for internal review, but the gist is to change the order we iterate over the old view and new indexes to favor new index instances. NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207 above *I've found a few other interleavings by using another machine, but the general issue is the same. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364690#comment-16364690 ] Jordan West commented on CASSANDRA-14055: - [~lboutros], the order we release them shouldn't matter for file deletion because as long as one index is open, there is one sstable open, and therefore the global reference count for the table is > 0. But if we keep the older reference we are leaking the old reference (and using the old metadata) until the {{SSTableIndex}} is released, which is wrong. Either way, sounds like we agree on the fix :). I will post a new patch once approved internally. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Issue Comment Deleted] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Comment: was deleted (was: [~lboutros], the order we release them shouldn't matter for file deletion because as long as one index is open, there is one sstable open, and therefore the global reference count for the table is > 0. But if we keep the older reference we are leaking the old reference (and using the old metadata) until the {{SSTableIndex}} is released, which is wrong. Either way, sounds like we agree on the fix :). I will post a new patch once approved internally. ) > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369601#comment-16369601 ] Jordan West commented on CASSANDRA-14055: - [~lboutros]/[~jasobrown], some updates: I have attached two new patches. One for trunk and one of 3.11. Unfortunately, the test changes in trunk don't work well on 3.11 so we can't have one patch. The primary changes in this patch are to change the order we iterate over the indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} collection since its visible outside the {{View}} constructor. ||3.11||Trunk|| |[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]| |[utests|https://circleci.com/gh/jrwest/cassandra/24]|[utests|https://circleci.com/gh/jrwest/cassandra/26]| NOTE: same utests are failing on [trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on getting dtests running with my CircleCI setup. Also, I spoke with some colleagues including [~beobal] and [~krummas] about the use of {{sstableRef.globalCount()}} to determine when to delete the SASI index file. I've come to the conclusion that its use at all is wrong because it represents the number of references to the instance, not globally. Given index summary redistribution, this isn't a safe assumption. Looking back at the original SASI patches, I am not sure why it got merged this way. The [patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120] used {{sstable.isMarkedCompacted()}} but the [merged code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124] used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I propose we split that work into a separate ticket and focus this one on SASI's failure to account for index redistribution in the {{View}}. The work covered by the other ticket would entail either a) deleting the SASI index files as part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and implementing a tidier specific to it. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch, CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14055: Attachment: 14055-jrwest-trunk.patch 14055-jrwest-3.11.patch > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, > CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373192#comment-16373192 ] Jordan West commented on CASSANDRA-14055: - [~lboutros] its in [~jasobrown]'s queue to give it one more review but I hope next week. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, > CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14248) SSTableIndex should not use Ref#globalCount() to determine when to delete index file
Jordan West created CASSANDRA-14248: --- Summary: SSTableIndex should not use Ref#globalCount() to determine when to delete index file Key: CASSANDRA-14248 URL: https://issues.apache.org/jira/browse/CASSANDRA-14248 Project: Cassandra Issue Type: Bug Components: sasi Reporter: Jordan West Assignee: Jordan West Fix For: 3.11.x {{SSTableIndex}} instances maintain a {{Ref}} to the underlying {{SSTableReader}} instance. When determining whether or not to delete the file after the last {{SSTableIndex}} reference is released, the implementation uses {{sstableRef.globalCount()}}: [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SSTableIndex.java#L135.] This is incorrect because {{sstableRef.globalCount()}} returns the number of references to the specific instance of {{SSTableReader}}. However, in cases like index summary redistribution, there can be more than one instance of {{SSTableReader}}. Further, since the reader is shared across multiple indexes, not all indexes see the count go to 0. This can lead to cases where the {{SSTableIndex}} file is incorrectly deleted or not deleted when it should be. A more correct implementation would be to either: * Tie into the existing {{SSTableTidier}}. SASI indexes already are SSTable components but are not cleaned up by the {{SSTableTidier}} because they are not found with the currently cleanup implementation * Revamp {{SSTableIndex}} reference counting to use {{Ref}} and implement a new tidier. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371687#comment-16371687 ] Jordan West commented on CASSANDRA-14055: - [~lboutros] Great! Thanks for taking a look. I've created https://issues.apache.org/jira/browse/CASSANDRA-14248. > Index redistribution breaks SASI index > -- > > Key: CASSANDRA-14055 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14055 > Project: Cassandra > Issue Type: Bug > Components: sasi >Reporter: Ludovic Boutros >Assignee: Ludovic Boutros >Priority: Major > Labels: patch > Fix For: 3.11.x, 4.x > > Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, > CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, > CASSANDRA-14055.patch > > > During index redistribution process, a new view is created. > During this creation, old indexes should be released. > But, new indexes are "attached" to the same SSTable as the old indexes. > This leads to the deletion of the last SASI index file and breaks the index. > The issue is in this function : > [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-11990) Address rows rather than partitions in SASI
[ https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West reassigned CASSANDRA-11990: --- Assignee: Jordan West (was: Alex Petrov) > Address rows rather than partitions in SASI > --- > > Key: CASSANDRA-11990 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11990 > Project: Cassandra > Issue Type: Improvement > Components: CQL, sasi >Reporter: Alex Petrov >Assignee: Jordan West >Priority: Major > Fix For: 4.x > > Attachments: perf.pdf, size_comparison.png > > > Currently, the lookup in SASI index would return the key position of the > partition. After the partition lookup, the rows are iterated and the > operators are applied in order to filter out ones that do not match. > bq. TokenTree which accepts variable size keys (such would enable different > partitioners, collections support, primary key indexing etc.), -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings
[ https://issues.apache.org/jira/browse/CASSANDRA-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562462#comment-16562462 ] Jordan West commented on CASSANDRA-14614: - I'll have a patch up later this afternoon if no one wants to take this. Just need to run to a few meetings. > CircleCI config has dtests enabled but not the correct resources settings > - > > Key: CASSANDRA-14614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14614 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > > The commit for -CASSANDRA-9608- enabled the {{with_dtests_jobs}} > configuration in {{.circleci/config.yml}} but not the necessary env var > settings. We should revert this, unless we planned to start running dtests > with the correct resources on every master commit, in which case we should > fix the resources. > (cc [~snazy] [~jasobrown]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings
Jordan West created CASSANDRA-14614: --- Summary: CircleCI config has dtests enabled but not the correct resources settings Key: CASSANDRA-14614 URL: https://issues.apache.org/jira/browse/CASSANDRA-14614 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jordan West Assignee: Jordan West The commit for -CASSANDRA-9608- enabled the {{with_dtests_jobs}} configuration in {{.circleci/config.yml}} but not the necessary env var settings. We should revert this, unless we planned to start running dtests with the correct resources on every master commit, in which case we should fix the resources. (cc [~snazy] [~jasobrown]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings
[ https://issues.apache.org/jira/browse/CASSANDRA-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562673#comment-16562673 ] Jordan West commented on CASSANDRA-14614: - [branch|https://github.com/jrwest/cassandra/tree/14614-trunk] | [tests|https://circleci.com/gh/jrwest/cassandra/tree/14614-trunk] > CircleCI config has dtests enabled but not the correct resources settings > - > > Key: CASSANDRA-14614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14614 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jordan West >Assignee: Jordan West >Priority: Trivial > > The commit for -CASSANDRA-9608- enabled the {{with_dtests_jobs}} > configuration in {{.circleci/config.yml}} but not the necessary env var > settings. We should revert this, unless we planned to start running dtests > with the correct resources on every master commit, in which case we should > fix the resources. > (cc [~snazy] [~jasobrown]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
[ https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558722#comment-16558722 ] Jordan West commented on CASSANDRA-14468: - [~iamaleksey] reading the code again, I *think* it should be safe to drop as well, for the reasons you list. The {{ColumnIdentifier}} in the {{ColumnDefinition}}/{{ColumnMetadata}} will be different (by reference) than the ones returned by {{Literal#prepare}} but since they are structurally equal that should be ok. Otherwise, its hard to separate out its initial intention since it was committed as part of CASSANDRA-8099. > "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16 > - > > Key: CASSANDRA-14468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14468 > Project: Cassandra > Issue Type: Bug >Reporter: Wade Simmons >Priority: Major > Attachments: data.tar.gz > > > I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this > error: > {code} > org.apache.cassandra.exceptions.ConfigurationException: Unable to parse > targets for index idx_foo ("666f6f") > at > org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > [apache-cassandra-3.0.16.jar:3.0.16] > {code} > It looks like this might be related to CASSANDRA-14104 that was just added to > 3.0.16 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14542) Deselect no_offheap_memtables dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14542: Reviewer: Jordan West > Deselect no_offheap_memtables dtests > > > Key: CASSANDRA-14542 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14542 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Minor > Labels: dtest > > After the large rework of dtests in CASSANDRA-14134, one task left undone was > to enable running dtests with offheap memtables. That was resolved in > CASSANDRA-14056. However, there are a few tests explicitly marked as > "no_offheap_memtables", and we should respect that marking when running the > dtests with offheap memtables enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14197) SSTable upgrade should be automatic
[ https://issues.apache.org/jira/browse/CASSANDRA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14197: Reviewers: Ariel Weisberg (was: Ariel Weisberg, Jordan West) > SSTable upgrade should be automatic > --- > > Key: CASSANDRA-14197 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14197 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.0 > > > Upgradesstables should run automatically on node upgrade -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14197) SSTable upgrade should be automatic
[ https://issues.apache.org/jira/browse/CASSANDRA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14197: Reviewers: Ariel Weisberg, Jordan West Reviewer: (was: Ariel Weisberg) > SSTable upgrade should be automatic > --- > > Key: CASSANDRA-14197 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14197 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.0 > > > Upgradesstables should run automatically on node upgrade -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
[ https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West reassigned CASSANDRA-14468: --- Assignee: Jordan West > "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16 > - > > Key: CASSANDRA-14468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14468 > Project: Cassandra > Issue Type: Bug >Reporter: Wade Simmons >Assignee: Jordan West >Priority: Major > Attachments: data.tar.gz > > > I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this > error: > {code} > org.apache.cassandra.exceptions.ConfigurationException: Unable to parse > targets for index idx_foo ("666f6f") > at > org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > [apache-cassandra-3.0.16.jar:3.0.16] > {code} > It looks like this might be related to CASSANDRA-14104 that was just added to > 3.0.16 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
[ https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558848#comment-16558848 ] Jordan West commented on CASSANDRA-14468: - Assigned to myself > "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16 > - > > Key: CASSANDRA-14468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14468 > Project: Cassandra > Issue Type: Bug >Reporter: Wade Simmons >Assignee: Jordan West >Priority: Major > Attachments: data.tar.gz > > > I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this > error: > {code} > org.apache.cassandra.exceptions.ConfigurationException: Unable to parse > targets for index idx_foo ("666f6f") > at > org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > [apache-cassandra-3.0.16.jar:3.0.16] > {code} > It looks like this might be related to CASSANDRA-14104 that was just added to > 3.0.16 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
[ https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14468: Reproduced In: 3.0.16, 3.0.15, 3.0.12, 3.0.10 (was: 3.0.10, 3.0.12, 3.0.15, 3.0.16) Status: Patch Available (was: Open) I would like to add a dtest for this but wanted to push up the patch to get review started. ||trunk||3.0|| |[branch|https://github.com/jrwest/cassandra/tree/14468-trunk]|[branch|https://github.com/jrwest/cassandra/tree/14468-3.0]| |[tests|https://circleci.com/gh/jrwest/cassandra/tree/14468-trunk]|[tests|https://circleci.com/gh/jrwest/cassandra/tree/14468-3.0]| > "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16 > - > > Key: CASSANDRA-14468 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14468 > Project: Cassandra > Issue Type: Bug >Reporter: Wade Simmons >Assignee: Jordan West >Priority: Major > Attachments: data.tar.gz > > > I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this > error: > {code} > org.apache.cassandra.exceptions.ConfigurationException: Unable to parse > targets for index idx_foo ("666f6f") > at > org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) > [apache-cassandra-3.0.16.jar:3.0.16] > {code} > It looks like this might be related to CASSANDRA-14104 that was just added to > 3.0.16 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14542) Deselect no_offheap_memtables dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558478#comment-16558478 ] Jordan West commented on CASSANDRA-14542: - +1 > Deselect no_offheap_memtables dtests > > > Key: CASSANDRA-14542 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14542 > Project: Cassandra > Issue Type: Improvement > Components: Testing >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Minor > Labels: dtest > > After the large rework of dtests in CASSANDRA-14134, one task left undone was > to enable running dtests with offheap memtables. That was resolved in > CASSANDRA-14056. However, there are a few tests explicitly marked as > "no_offheap_memtables", and we should respect that marking when running the > dtests with offheap memtables enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14636) Revert 4.0 GC alg back to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576589#comment-16576589 ] Jordan West edited comment on CASSANDRA-14636 at 8/10/18 5:10 PM: -- +1. My understanding is {{UseParNewGC}} is implicit in Java 10+. Otherwise it might be worth adding a note about that specific change. was (Author: jrwest): +1. My understanding is {{UseParNewGC}} is implicit in Java 10+, otherwise it might be worth adding a note about that specific change. > Revert 4.0 GC alg back to CMS > - > > Key: CASSANDRA-14636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14636 > Project: Cassandra > Issue Type: Bug >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > CASSANDRA-9608 accidentally swapped the default GC algorithm from CMS to G1. > Until further community consensus is achieved about swapping the default alg, > we should switch back to CMS. > As reported by [~rustyrazorblade] on the [dev@ > ML|https://lists.apache.org/thread.html/0b30f9c84457033583e9a3e0828adc603e01f1ca03ce0816098883cc@%3Cdev.cassandra.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14636) Revert 4.0 GC alg back to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576589#comment-16576589 ] Jordan West commented on CASSANDRA-14636: - +1. My understanding is {{UseParNewGC}} is implicit in Java 10+, otherwise it might be worth adding a note about that specific change. > Revert 4.0 GC alg back to CMS > - > > Key: CASSANDRA-14636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14636 > Project: Cassandra > Issue Type: Bug >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > CASSANDRA-9608 accidentally swapped the default GC algorithm from CMS to G1. > Until further community consensus is achieved about swapping the default alg, > we should switch back to CMS. > As reported by [~rustyrazorblade] on the [dev@ > ML|https://lists.apache.org/thread.html/0b30f9c84457033583e9a3e0828adc603e01f1ca03ce0816098883cc@%3Cdev.cassandra.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8
Jordan West created CASSANDRA-14627: --- Summary: CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8 Key: CASSANDRA-14627 URL: https://issues.apache.org/jira/browse/CASSANDRA-14627 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jordan West CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are not supported in Java 8: https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841. {code} Unrecognized option: --add-exports Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. {code} To reproduce: 1. Update to the most recent trunk 2. rm -rf .idea && ant generate-idea-files 3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573483#comment-16573483 ] Jordan West commented on CASSANDRA-14627: - This is slightly different from CASSANDRA-14613 in that {{ide/workspace.xml}} is broken instead of {{ide/idea-iml-file.xml}} but I'm happy to dupe this to it. I do think a short term fix for this is warranted: at a minimum, breaking Java 11 in the IDE instead of Java 8. > CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8 > - > > Key: CASSANDRA-14627 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14627 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jordan West >Priority: Critical > > CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are > not supported in Java 8: > https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841. > > {code} > Unrecognized option: --add-exports > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > {code} > To reproduce: > 1. Update to the most recent trunk > 2. rm -rf .idea && ant generate-idea-files > 3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a > test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8
[ https://issues.apache.org/jira/browse/CASSANDRA-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573486#comment-16573486 ] Jordan West commented on CASSANDRA-14627: - I should also note, the local workaround, in the meantime, is to manually delete the Java 11 arguments from {{ide/workspace.xml}} or from the specific IntelliJ configurations being used. > CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8 > - > > Key: CASSANDRA-14627 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14627 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jordan West >Priority: Critical > > CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are > not supported in Java 8: > https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841. > > {code} > Unrecognized option: --add-exports > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > {code} > To reproduce: > 1. Update to the most recent trunk > 2. rm -rf .idea && ant generate-idea-files > 3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a > test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14644) CircleCI Builds should optional run in-tree tests other than test/unit
[ https://issues.apache.org/jira/browse/CASSANDRA-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581636#comment-16581636 ] Jordan West commented on CASSANDRA-14644: - [~vinaykumarcse] done. Thanks! I will be happy to review. > CircleCI Builds should optional run in-tree tests other than test/unit > -- > > Key: CASSANDRA-14644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14644 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jordan West >Assignee: Vinay Chella >Priority: Critical > > Currently, circleci is hardcoded to search for tests in the test/unit > directory only: > https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L166. > This means tests under `test-compression` and `test-long` are not run. Like > dtests, there should be a simple way to modify the config to run these as > well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org