[jira] [Commented] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub

2015-05-18 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549268#comment-14549268
 ] 

Jordan West commented on CASSANDRA-9406:


patch here (base is cassandra-2.0): 
https://github.com/jrwest/cassandra/tree/9406

 Add Option to Not Validate Atoms During Scrub
 -

 Key: CASSANDRA-9406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jordan West
Assignee: Jordan West
Priority: Minor
 Fix For: 2.0.x


 In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData 
 to true. This should be made configurable when running scrub via JMX or 
 StandaloneScrubber.
 Since inbound data is not validated, Scrub without this option will throw 
 away data that is not corrupt, but misrepresented (e.g. an int is stored 
 but validator = LongType), while Cassandra and application clients will 
 happily continue to read and write data with this misrepresentation (although 
 some care may need to be taken on the application side). Scrub will throw 
 these rows out leading to a large amount of data loss. 
 In these applications it is desirable for scrub to check for row/file 
 corruption but not validate the column values (which can result in a large 
 percentage of data being thrown away). This would be made possible by adding 
 such a flag to disable validation in the SSTableIdentityIterator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub

2015-05-15 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-9406:
--

 Summary: Add Option to Not Validate Atoms During Scrub
 Key: CASSANDRA-9406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jordan West
Priority: Minor
 Fix For: 2.0.x


In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData 
to true. This should be made configurable when running scrub via JMX or 
StandaloneScrubber.

Since inbound data is not validated, Scrub without this option will throw away 
data that is not corrupt, but misrepresented (e.g. an int is stored but 
validator = LongType), while Cassandra and application clients will happily 
continue to read and write data with this misrepresentation (although some care 
may need to be taken on the application side). Scrub will throw these rows out 
leading to a large amount of data loss. 

In these applications it is desirable for scrub to check for row/file 
corruption but not validate the column values (which can result in a large 
percentage of data being thrown away). This would be made possible by adding 
such a flag to disable validation in the SSTableIdentityIterator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-9406) Add Option to Not Validate Atoms During Scrub

2015-05-15 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West reassigned CASSANDRA-9406:
--

Assignee: Jordan West

 Add Option to Not Validate Atoms During Scrub
 -

 Key: CASSANDRA-9406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9406
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jordan West
Assignee: Jordan West
Priority: Minor
 Fix For: 2.0.x


 In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData 
 to true. This should be made configurable when running scrub via JMX or 
 StandaloneScrubber.
 Since inbound data is not validated, Scrub without this option will throw 
 away data that is not corrupt, but misrepresented (e.g. an int is stored 
 but validator = LongType), while Cassandra and application clients will 
 happily continue to read and write data with this misrepresentation (although 
 some care may need to be taken on the application side). Scrub will throw 
 these rows out leading to a large amount of data loss. 
 In these applications it is desirable for scrub to check for row/file 
 corruption but not validate the column values (which can result in a large 
 percentage of data being thrown away). This would be made possible by adding 
 such a flag to disable validation in the SSTableIdentityIterator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10681) make index building pluggable via IndexBuildTask

2015-11-20 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018508#comment-15018508
 ] 

Jordan West commented on CASSANDRA-10681:
-

I prefer the version that makes it explicit that some operations may be 
performed across the entire table data set (indexes/sstables) but if we are not 
going that route I would vote we use the other patch Pavel posted, which avoids 
the unnecessary unused instance construction.

Also,  is there a need to introduce a new `IndexBuildTask class` or can we just 
use `SecondaryIndexBuilder` as the interface and subclass it? Is there a 
concern about re-using the existing class but making it abstract?

> make index building pluggable via IndexBuildTask
> 
>
> Key: CASSANDRA-10681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10681
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
> Attachments: 0001-add-table-support-for-multi-table-builds.patch, 
> 0001-make-index-building-pluggable-via-IndexBuildTask.patch
>
>
> Currently index building assumes one and only way to build all of the indexes 
> - through SecondaryIndexBuilder - which merges all of the sstables together, 
> collates columns etc. Such works fine for built-in indexes but not for SASI 
> since it's attaches to every SSTable individually. We need a "IndexBuildTask" 
> interface (based on CompactionInfo.Holder) to be returned from Index on 
> demand to give power to SI interface implementers to decide how build should 
> work. This might be less effective for CassandraIndex, since this effectively 
> means that collation will have to be done multiple times on the same data, 
> but  nevertheless is a good compromise for clean interface to outside world.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10681) make index building pluggable via IndexBuildTask

2015-11-20 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018508#comment-15018508
 ] 

Jordan West edited comment on CASSANDRA-10681 at 11/20/15 6:52 PM:
---

I prefer the version that makes it explicit that some operations may be 
performed across the entire table data set (indexes/sstables) but if we are not 
going that route I would vote we use the other patch Pavel posted, which avoids 
the unnecessary unused instance construction.

Also,  is there a need to introduce a new `IndexBuildTask` class or can we just 
use `SecondaryIndexBuilder` as the interface and subclass it? Is there a 
concern about re-using the existing class but making it abstract?


was (Author: jrwest):
I prefer the version that makes it explicit that some operations may be 
performed across the entire table data set (indexes/sstables) but if we are not 
going that route I would vote we use the other patch Pavel posted, which avoids 
the unnecessary unused instance construction.

Also,  is there a need to introduce a new `IndexBuildTask class` or can we just 
use `SecondaryIndexBuilder` as the interface and subclass it? Is there a 
concern about re-using the existing class but making it abstract?

> make index building pluggable via IndexBuildTask
> 
>
> Key: CASSANDRA-10681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10681
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Minor
>  Labels: sasi
> Fix For: 3.x
>
> Attachments: 0001-add-table-support-for-multi-table-builds.patch, 
> 0001-make-index-building-pluggable-via-IndexBuildTask.patch
>
>
> Currently index building assumes one and only way to build all of the indexes 
> - through SecondaryIndexBuilder - which merges all of the sstables together, 
> collates columns etc. Such works fine for built-in indexes but not for SASI 
> since it's attaches to every SSTable individually. We need a "IndexBuildTask" 
> interface (based on CompactionInfo.Holder) to be returned from Index on 
> demand to give power to SI interface implementers to decide how build should 
> work. This might be less effective for CassandraIndex, since this effectively 
> means that collation will have to be done multiple times on the same data, 
> but  nevertheless is a good compromise for clean interface to outside world.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347099#comment-15347099
 ] 

Jordan West commented on CASSANDRA-12073:
-

[~doanduyhai] the fix you mentioned looks correct. it should only impact the 
[code]PREFIX[/code] type because of the implementation of `nonMatchingPartial`. 
Unfortunately this means we may do a bit more work in the index than desired 
but this is an acceptable consequence of the more complex query. 

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the term match but also returns *false* when it's a *partial term* !
> {code:java}
> public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
> {
> if (!hasUpper())
> return true;
> if (nonMatchingPartial(term))
> return false;
> int cmp = term.compareTo(validator, upper.value, false);
> return cmp 

[jira] [Comment Edited] (CASSANDRA-12073) [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial results

2016-06-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347099#comment-15347099
 ] 

Jordan West edited comment on CASSANDRA-12073 at 6/23/16 8:20 PM:
--

[~doanduyhai] the fix you mentioned looks correct. it should only impact the 
PREFIX type because of the implementation of `nonMatchingPartial`. 
Unfortunately this means we may do a bit more work in the index than desired 
but this is an acceptable consequence of the more complex query. 


was (Author: jrwest):
[~doanduyhai] the fix you mentioned looks correct. it should only impact the 
[code]PREFIX[/code] type because of the implementation of `nonMatchingPartial`. 
Unfortunately this means we may do a bit more work in the index than desired 
but this is an acceptable consequence of the more complex query. 

> [SASI] PREFIX search on CONTAINS/NonTokenizer mode returns only partial 
> results
> ---
>
> Key: CASSANDRA-12073
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12073
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Cassandra 3.7
>Reporter: DOAN DuyHai
>
> {noformat}
> cqlsh:music> CREATE TABLE music.albums (
> id uuid PRIMARY KEY,
> artist text,
> country text,
> quality text,
> status text,
> title text,
> year int
> );
> cqlsh:music> CREATE CUSTOM INDEX albums_artist_idx ON music.albums (artist) 
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> cqlsh:music> SELECT * FROM albums WHERE artist like 'lady%'  LIMIT 100;
>  id   | artist| country| quality 
> | status| title | year
> --+---++-+---+---+--
>  372bb0ab-3263-41bc-baad-bb520ddfa787 | Lady Gaga |USA |  normal 
> |  Official |   Red and Blue EP | 2006
>  1a4abbcd-b5de-4c69-a578-31231e01ff09 | Lady Gaga |Unknown |  normal 
> | Promotion |Poker Face | 2008
>  31f4a0dc-9efc-48bf-9f5e-bfc09af42b82 | Lady Gaga |USA |  normal 
> |  Official |   The Cherrytree Sessions | 2009
>  8ebfaebd-28d0-477d-b735-469661ce6873 | Lady Gaga |Unknown |  normal 
> |  Official |Poker Face | 2009
>  98107d82-e0dd-46bc-a273-1577578984c7 | Lady Gaga |USA |  normal 
> |  Official |   Just Dance: The Remixes | 2008
>  a76af0f2-f5c5-4306-974a-e3c17158e6c6 | Lady Gaga |  Italy |  normal 
> |  Official |  The Fame | 2008
>  849ee019-8b15-4767-8660-537ab9710459 | Lady Gaga |USA |  normal 
> |  Official |Christmas Tree | 2008
>  4bad59ac-913f-43da-9d48-89adc65453d2 | Lady Gaga |  Australia |  normal 
> |  Official | Eh Eh | 2009
>  80327731-c450-457f-bc12-0a8c21fd9c5d | Lady Gaga |USA |  normal 
> |  Official | Just Dance Remixes Part 2 | 2008
>  3ad33659-e932-4d31-a040-acab0e23c3d4 | Lady Gaga |Unknown |  normal 
> |  null |Just Dance | 2008
>  9adce7f6-6a1d-49fd-b8bd-8f6fac73558b | Lady Gaga | United Kingdom |  normal 
> |  Official |Just Dance | 2009
> (11 rows)
> {noformat}
> *SASI* says that there are only 11 artists whose name starts with {{lady}}.
> However, in the data set, there are:
> * Lady Pank
> * Lady Saw
> * Lady Saw
> * Ladyhawke
> * Ladytron
> * Ladysmith Black Mambazo
> * Lady Gaga
> * Lady Sovereign
> etc ...
> By debugging the source code, the issue is in 
> {{OnDiskIndex.TermIterator::computeNext()}}
> {code:java}
> for (;;)
> {
> if (currentBlock == null)
> return endOfData();
> if (offset >= 0 && offset < currentBlock.termCount())
> {
> DataTerm currentTerm = currentBlock.getTerm(nextOffset());
> if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
> continue;
> // flip the flag right on the first bounds match
> // to avoid expensive comparisons
> checkLower = false;
> if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
> return endOfData();
> return currentTerm;
> }
> nextBlock();
> }
> {code}
>  So the {{endOfData()}} conditions are:
> * currentBlock == null
> * checkUpper && !e.isUpperSatisfiedBy(currentTerm)
> The problem is that {{e::isUpperSatisfiedBy}} is checking not only whether 
> the 

[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918
 ] 

Jordan West commented on CASSANDRA-10661:
-

bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, CQL would support a `LIKE` operator similar 
to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` 
we would allow/disallow forms such as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918
 ] 

Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM:
--

bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, `=` would mean exact match and CQL would 
support a `LIKE` operator similar to SQL, and depending on if the index was 
created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as 
`%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 


was (Author: jrwest):
bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, CQL would support a `LIKE` operator similar 
to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` 
we would allow/disallow forms such as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11067) Improve SASI syntax

2016-01-26 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117681#comment-15117681
 ] 

Jordan West commented on CASSANDRA-11067:
-

bq. other than the fact that is new and experimental and unproven in the real 
world yet?

I think SASI is as, or more, proven than any change in the 3.x releases. Its 
been in production for longer than any 3.x feature and most of the changes for 
3.x were surface-level integration changes as [~xedin] mentioned.

bq. The fact that a SASI index needs to be "CUSTOM" and an explicit class name 
is needed feels a little hokey to me.

Agreed but we decided not to change this to ease the merge and because the 
sstable's format is not extendable easily, currently. Also, this the case for 
any non-default index class. I think it would be great to SASI become the 
default implementation or to have an easier way to specify which implementation 
to use.

> Improve SASI syntax
> ---
>
> Key: CASSANDRA-11067
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
> Project: Cassandra
>  Issue Type: Task
>  Components: CQL
>Reporter: Jonathan Ellis
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's 
> probably not in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean 
> "satisfies index expression."  The problem is that it will be very difficult 
> to back out of this behavior once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, 
> maybe.  With the exact same behavior that SASI currently exposes, just with a 
> separate operator rather than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11449) Add NOT LIKE for PREFIX/CONTAINS Mode SASI Indexes

2016-03-28 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-11449:
---

 Summary: Add NOT LIKE for PREFIX/CONTAINS Mode SASI Indexes
 Key: CASSANDRA-11449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11449
 Project: Cassandra
  Issue Type: New Feature
  Components: sasi
Reporter: Jordan West
Assignee: Pavel Yaskevich


Internally, SASI already supports {{NOT LIKE}} but the CQL3 layer and grammar 
need to be extended to support it. The same rules that apply to {{LIKE}} for 
{{PREFIX}} and {{CONTAINS}} modes would apply to {{NOT LIKE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216186#comment-15216186
 ] 

Jordan West commented on CASSANDRA-11383:
-

bq.  Was the conclusion that a SPARSE SASI index would work well even for low 
cardinality data (as in the original reported case, for period_end_month_int), 
or was there some application-level change required to adapt to a SASI change 
as well?

{{period_end_month_int}} is still the incorrect use case for {{SPARSE}}. That 
did not change. {{SPARSE}} is still intended for indexes/terms where there are 
a large number of terms and a low number of tokens/keys per term (the token 
trees in the index are sparse). The {{period_end_month_int}} use-case is a 
dense index: there are few terms and each term has a large number of 
tokens/keys (the token trees in the index are dense). The merged patch improves 
memory overhead in either case when building indexes from a large sstable. 

What was modified is that indexes marked {{SPARSE}} that have more than 5 
tokens for any term in the index will fail to build and an exception will be 
logged. 

bq.  Is it now official that a non-SPARSE SASI index (e.g., PREFIX) can be used 
for non-TEXT data (int in particular), at least for the case of exact match 
lookup?

{{PREFIX}} mode has always been supported for numeric data and was/continues to 
be the default mode if none is specified. PREFIX mode should be considered "NOT 
SPARSE" for numerical data. 

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216337#comment-15216337
 ] 

Jordan West commented on CASSANDRA-11383:
-

bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode 
is considered advisable/recommended for high cardinality column data, where 
SPARSE mode is nominally a better choice. Maybe that is strictly a matter of 
whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is 
required, but if not, SPARSE mode sounds like the better choice. But I don't 
have a handle on the internal index structures to know if that's absolutely the 
case - that a PREFIX index for SPARSE data would necessarily be larger and/or 
slower than a SPARSE index for high cardinality data. I would hope so, but it 
would be good to have that confirmed.

{{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that 
is sparse (every term/column value has less than 5 matching keys), such as 
indexing the {{created_at}} field in time series data (where there is typically 
few matching rows/events per {{created_at}} timestamp), it is best to use 
{{SPARSE}}, always, and especially in cases where range queries are used. 
{{SPARSE}} is primarily an optimization for range queries on this sort of data. 
Its biggest effect is visible on large ranges (e.g. spanning multiple days of 
time series data). 

The decision process for whether or not to use {{SPARSE}} should be:
1. is the data a numeric type?
2. is it expected that there will be a large (millions or more) number of terms 
(column values) in the index with each term having a small (5 or less) set of 
matching tokens (partition keys)?

If the answer to both is Yes then use {{SPARSE}}.

>From the docs 
>(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder):

bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a 
TokenTree is built merging all the TokenTrees for each term into a single one. 
This copy of the data is used for efficient iteration of large ranges of e.g. 
timestamps. The index "mode" is configurable per column at index creation time.


> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216337#comment-15216337
 ] 

Jordan West edited comment on CASSANDRA-11383 at 3/29/16 5:02 PM:
--

bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode 
is considered advisable/recommended for high cardinality column data, where 
SPARSE mode is nominally a better choice. Maybe that is strictly a matter of 
whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is 
required, but if not, SPARSE mode sounds like the better choice. But I don't 
have a handle on the internal index structures to know if that's absolutely the 
case - that a PREFIX index for SPARSE data would necessarily be larger and/or 
slower than a SPARSE index for high cardinality data. I would hope so, but it 
would be good to have that confirmed.

{{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that 
is sparse (every term/column value has less than 5 matching keys), such as 
indexing the {{created_at}} field in time series data (where there is typically 
few matching rows/events per {{created_at}} timestamp), it is best to use 
{{SPARSE}}, always, and especially in cases where range queries are used. 
{{SPARSE}} is primarily an optimization for range queries on this sort of data. 
Its biggest effect is visible on large ranges (e.g. spanning multiple days of 
time series data). 

The decision process for whether or not to use {{SPARSE}} should be:
1. is the data a numeric type?
2. is it expected that there will be a large (millions or more) number of terms 
(column values) in the index with each term having a small (5 or less) set of 
matching tokens (partition keys)?
3. will range queries be performed against this index?

If the answer to all three questions is Yes then use {{SPARSE}}.

>From the docs 
>(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder):

bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a 
TokenTree is built merging all the TokenTrees for each term into a single one. 
This copy of the data is used for efficient iteration of large ranges of e.g. 
timestamps. The index "mode" is configurable per column at index creation time.



was (Author: jrwest):
bq. Maybe that leaves one last question as to whether non-SPARSE (PREFIX) mode 
is considered advisable/recommended for high cardinality column data, where 
SPARSE mode is nominally a better choice. Maybe that is strictly a matter of 
whether the prefix/LIKE feature is to be utilized - if so, than PREFIX mode is 
required, but if not, SPARSE mode sounds like the better choice. But I don't 
have a handle on the internal index structures to know if that's absolutely the 
case - that a PREFIX index for SPARSE data would necessarily be larger and/or 
slower than a SPARSE index for high cardinality data. I would hope so, but it 
would be good to have that confirmed.

{{SPARSE}} is only for numeric data so LIKE queries do not apply. For data that 
is sparse (every term/column value has less than 5 matching keys), such as 
indexing the {{created_at}} field in time series data (where there is typically 
few matching rows/events per {{created_at}} timestamp), it is best to use 
{{SPARSE}}, always, and especially in cases where range queries are used. 
{{SPARSE}} is primarily an optimization for range queries on this sort of data. 
Its biggest effect is visible on large ranges (e.g. spanning multiple days of 
time series data). 

The decision process for whether or not to use {{SPARSE}} should be:
1. is the data a numeric type?
2. is it expected that there will be a large (millions or more) number of terms 
(column values) in the index with each term having a small (5 or less) set of 
matching tokens (partition keys)?

If the answer to both is Yes then use {{SPARSE}}.

>From the docs 
>(https://github.com/xedin/cassandra/blob/trunk/doc/SASI.md#ondiskindexbuilder):

bq. The SPARSE mode differs from PREFIX in that for every 64 blocks of terms a 
TokenTree is built merging all the TokenTrees for each term into a single one. 
This copy of the data is used for efficient iteration of large ranges of e.g. 
timestamps. The index "mode" is configurable per column at index creation time.


> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> 

[jira] [Commented] (CASSANDRA-11434) Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting SA metadata per term

2016-03-28 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215073#comment-15215073
 ] 

Jordan West commented on CASSANDRA-11434:
-

The branch linked below implements the described changes. The test changes 
reflect the feature changes made. This is a backwards compatible change. It 
uses an unused (zeroed) byte in the index header to indicate if the index 
supports the new kind of query. Existing indexes will automatically be upgraded 
to support marked partials when compacted. PREFIX queries against a CONTAINS 
column whose indexes have not yet been upgraded will still result in an 
exception and failed request (but with a different exception than 
{{InvalidRequestException}}). Once the index is rebuilt (manually or via 
compaction) the exception will stop being thrown. 

||branch||testall||dtest||
|[CASSANDRA-11434|https://github.com/xedin/cassandra/tree/CASSANDRA-11434]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11434-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11434-dtest/]|

> Support EQ/PREFIX queries in CONTAINS mode without tokenization by augmenting 
> SA metadata per term
> --
>
> Key: CASSANDRA-11434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11434
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Pavel Yaskevich
>Assignee: Jordan West
> Fix For: 3.6
>
>
> We can support EQ/PREFIX requests to CONTAINS indexes by tracking 
> "partiality" of the data stored in the OnDiskIndex and IndexMemtable, if we 
> know exactly if current match represents part of the term or it's original 
> form it would be trivial to support EQ/PREFIX since PREFIX is subset of 
> SUFFIX matches.
> Since we attach uint16 size to each term stored we can take advantage of sign 
> bit so size of the index is not impacted at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-08 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233209#comment-15233209
 ] 

Jordan West commented on CASSANDRA-11525:
-

[~doanduyhai] we have tracked down the root cause of the bug and it has 
affected all versions of SASI since its original inclusion in Cassandra. The 
issue is that when positions in the -Index.db file are > Integer.MAX_VALUE the 
positions are factored into a 32-bit and 16-bit value. The 16-bit value was 
being read as a signed short and for certain positions this resulted in 
reconstructing an incorrect 64-bit offset from the 32-bit and 16-bit parts. 
Thankfully, this is a quick, one-line fix (reading the short as unsigned), and 
is entirely independent of the changes in CASSANDRA-11383 or this ticket. We 
will include the fix for this with the merge of the changes in this ticket. We 
are working on final verification using your SSTables before we merge. 

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   

[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-07 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231597#comment-15231597
 ] 

Jordan West commented on CASSANDRA-11525:
-

||branch||testall||dtest||
|[CASSANDRA-11525|https://github.com/xedin/cassandra/tree/CASSANDRA-11525]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-dtest/]|

[~doanduyhai] can you try this branch and see if it  addresses the issue? Also, 
can you please upload all of the SSTable components (including SSTable index 
files) so we can test here as well?

The issue was caused by an invalid assumption when clustering columns are used: 
when stitching together multiple index parts it was possible that the same 
term, for the same token was in multiple parts, resulting in the union iterator 
returning the incorrect count. The new approach counts the number of tokens 
while performing the first iteration. The complexity of the algorithm has not 
changed and it should be similar in performance. 

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Updated] (CASSANDRA-11397) LIKE query on clustering column index returns incorrect results

2016-03-22 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-11397:

Reviewer: Jordan West  (was: Pavel Yaskevich)

> LIKE query on clustering column index returns incorrect results
> ---
>
> Key: CASSANDRA-11397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>  Labels: sasi
> Fix For: 3.5
>
>
> The way that {{ClusteringIndexFilter}} and {{RowFilter}} are constructed when 
> a {{LIKE}} restriction on a clustering column is present is incorrect. For 
> example:
> {code}
> cqlsh> create table ks.t1 (k text, c1 text, c2 text, c3 text, v text, primary 
> key (k,c1,c2,c3));
> cqlsh> create custom index on ks.t1(c2) using 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> cqlsh> select * from ks.t1;
>  k | c1 | c2 | c3 | v
> ---++++-
>  a | ba | ca | da | val
>  a | bb | cb | db | val
>  a | bc | cc | dc | val
> (3 rows)
>  
> cqlsh> select * from ks.t1 where c1 = 'ba' and c3 = 'da' and c2 LIKE 'c%' 
> ALLOW FILTERING;
>  k | c1 | c2 | c3 | v
> ---++++---
> (0 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11397) LIKE query on clustering column index returns incorrect results

2016-03-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208987#comment-15208987
 ] 

Jordan West commented on CASSANDRA-11397:
-

[~beobal] the patch looks good and i've confirmed your fix locally. One small 
piece of feedback: since you fixed things to return a 
{{ClusteringIndexSliceFilter}} I don't think the {{isLike}} check in 
{{PrimaryKeyRestrictionSet#appendTo}} is necessary anymore (i've removed it 
locally without issue). 

> LIKE query on clustering column index returns incorrect results
> ---
>
> Key: CASSANDRA-11397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11397
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>  Labels: sasi
> Fix For: 3.5
>
>
> The way that {{ClusteringIndexFilter}} and {{RowFilter}} are constructed when 
> a {{LIKE}} restriction on a clustering column is present is incorrect. For 
> example:
> {code}
> cqlsh> create table ks.t1 (k text, c1 text, c2 text, c3 text, v text, primary 
> key (k,c1,c2,c3));
> cqlsh> create custom index on ks.t1(c2) using 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> cqlsh> select * from ks.t1;
>  k | c1 | c2 | c3 | v
> ---++++-
>  a | ba | ca | da | val
>  a | bb | cb | db | val
>  a | bc | cc | dc | val
> (3 rows)
>  
> cqlsh> select * from ks.t1 where c1 = 'ba' and c3 = 'da' and c2 LIKE 'c%' 
> ALLOW FILTERING;
>  k | c1 | c2 | c3 | v
> ---++++---
> (0 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11383) Avoid index segment stitching in RAM which lead to OOM on big SSTable files

2016-03-29 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216917#comment-15216917
 ] 

Jordan West commented on CASSANDRA-11383:
-

The docs look pretty comprehensive. Thanks! I'll make a more detailed pass 
through them when I get a chance. I think the only thing we would like to 
clarify, based on the discussion in this ticket, is when to choose {{SPARSE}} 
over {{PREFIX}} for numerical data. My last comment 
(https://issues.apache.org/jira/browse/CASSANDRA-11383?focusedCommentId=15216337=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15216337)
 mentions a way to do that. 

Otherwise, specific to {{SPARSE}} the only recommendation I have is that the 
{{SPARSE}} example on the "CREATE CUSTOM INDEX (SASI)" page 
(https://docs.datastax.com/en/cql/3.3/cql/cql_reference/refCreateSASIIndex.html)
 uses {{age}}, which typically would not be a good candidate for a {{SPARSE}} 
index (the answer to question number 2 in my linked comment would be: no, there 
are not millions of ages with each term having a small number of matching 
keys). 

> Avoid index segment stitching in RAM which lead to OOM on big SSTable files 
> 
>
> Key: CASSANDRA-11383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: C* 3.4
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>  Labels: sasi
> Fix For: 3.5
>
> Attachments: CASSANDRA-11383.patch, 
> SASI_Index_build_LCS_1G_Max_SSTable_Size_logs.tar.gz, 
> new_system_log_CMS_8GB_OOM.log, system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions

2017-09-15 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-13869:

Attachment: 0001-Fix-AbstractTokenTreeBuilder-serializedSize-when-the.patch
attb-serialized-size-bug-test.patch

Attached two patches. attb-serialized-size-bug-test.patch is patch that can be 
applied to trunk illustrate the issue with a failing test. The other is the fix 
against trunk and some improved testing. 

> AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a 
> single leaf and overflow collisions
> ---
>
> Key: CASSANDRA-13869
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13869
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: 
> 0001-Fix-AbstractTokenTreeBuilder-serializedSize-when-the.patch, 
> attb-serialized-size-bug-test.patch
>
>
> In the extremely rare case where a small token tree (< 248 values) has 
> overflow collisions the size returned by 
> AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to 
> account for the overflow collisions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions

2017-09-14 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-13869:

Description: In the extremely rare case where a small token tree (< 248 
values) has overflow collisions the size returned by 
AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to 
account for the overflow collisions.   (was: In the extremely rare case where a 
small token tree (< 248 values) has overflow collisions* the size returned by 
AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to 
account for the overflow collisions. )

> AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a 
> single leaf and overflow collisions
> ---
>
> Key: CASSANDRA-13869
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13869
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Minor
> Fix For: 3.11.x
>
>
> In the extremely rare case where a small token tree (< 248 values) has 
> overflow collisions the size returned by 
> AbstractTokenTreeBuilder#serializedSize is incorrect because it fails to 
> account for the overflow collisions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13869) AbstractTokenTreeBuilder#serializedSize returns wrong value when there is a single leaf and overflow collisions

2017-09-14 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-13869:
---

 Summary: AbstractTokenTreeBuilder#serializedSize returns wrong 
value when there is a single leaf and overflow collisions
 Key: CASSANDRA-13869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13869
 Project: Cassandra
  Issue Type: Bug
  Components: sasi
Reporter: Jordan West
Assignee: Jordan West
Priority: Minor
 Fix For: 3.11.x


In the extremely rare case where a small token tree (< 248 values) has overflow 
collisions* the size returned by AbstractTokenTreeBuilder#serializedSize is 
incorrect because it fails to account for the overflow collisions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-05-14 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369601#comment-16369601
 ] 

Jordan West edited comment on CASSANDRA-14055 at 5/14/18 7:42 PM:
--

[~lboutros]/[~jasobrown], some updates:

 
 I have attached two new patches. One for trunk and one of 3.11. Unfortunately, 
the test changes in trunk don't work well on 3.11 so we can't have one patch. 
The primary changes in this patch are to change the order we iterate over the 
indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the 
newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} 
collection since its visible outside the {{View}} constructor. 
||3.11||Trunk||
|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]|
|[utests|https://circleci.com/gh/jrwest/cassandra/tree/14055-jrwest-3.11]|[utests|https://circleci.com/gh/jrwest/cassandra/tree/14055-jrwest-trunk]|

NOTE: same utests are failing on 
[trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on 
getting dtests running with my CircleCI setup. 

 

Also, I spoke with some colleagues including [~beobal] and [~krummas] about the 
use of {{sstableRef.globalCount()}} to determine when to delete the SASI index 
file. I've come to the conclusion that its use at all is wrong because it 
represents the number of references to the instance, not globally. Given index 
summary redistribution, this isn't a safe assumption. Looking back at the 
original SASI patches, I am not sure why it got merged this way. The 
[patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120]
 used {{sstable.isMarkedCompacted()}} but the [merged 
code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124]
 used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I 
propose we split that work into a separate ticket and focus this one on SASI's 
failure to account for index redistribution in the {{View}}. The work covered 
by the other ticket would entail either a) deleting the SASI index files as 
part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and 
implementing a tidier specific to it.


was (Author: jrwest):
[~lboutros]/[~jasobrown], some updates:

 
 I have attached two new patches. One for trunk and one of 3.11. Unfortunately, 
the test changes in trunk don't work well on 3.11 so we can't have one patch. 
The primary changes in this patch are to change the order we iterate over the 
indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the 
newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} 
collection since its visible outside the {{View}} constructor. 
||3.11||Trunk||
|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]|
|[utests|https://circleci.com/gh/jrwest/cassandra/24]|[utests|https://circleci.com/gh/jrwest/cassandra/26]|

NOTE: same utests are failing on 
[trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on 
getting dtests running with my CircleCI setup. 

 

Also, I spoke with some colleagues including [~beobal] and [~krummas] about the 
use of {{sstableRef.globalCount()}} to determine when to delete the SASI index 
file. I've come to the conclusion that its use at all is wrong because it 
represents the number of references to the instance, not globally. Given index 
summary redistribution, this isn't a safe assumption. Looking back at the 
original SASI patches, I am not sure why it got merged this way. The 
[patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120]
 used {{sstable.isMarkedCompacted()}} but the [merged 
code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124]
 used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I 
propose we split that work into a separate ticket and focus this one on SASI's 
failure to account for index redistribution in the {{View}}. The work covered 
by the other ticket would entail either a) deleting the SASI index files as 
part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and 
implementing a tidier specific to it.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Jordan West
>

[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-14 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475273#comment-16475273
 ] 

Jordan West commented on CASSANDRA-14417:
-

All minor comments:
 * Deprecated {{CFS#loadNewSSTables()}} no longer needs to be synchronized. It 
just constructs an {{ImportOptions}} instance and passes it to the synchronized 
{{loadNewSSTables(ImportOptions).}}
 * Add a reference (e.g. @see) to {{CFSMBean.importNewSSTables}} from 
{{SSMBean.loadNewSSTables}}
 * In Verifier, is it more appropriate to favor {{OutputHandler#output}} over 
{{OutputHandler#debug}} for the error message when a key is out of range?
 * Would like to see some tests (including base/empty case, edge cases like 
wrap around) for {{RangeOwnHelper}}
 * {{nodetool refresh}}: Is the removal of the deprecation output intentional?

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-16 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477263#comment-16477263
 ] 

Jordan West commented on CASSANDRA-14417:
-

Looks like the most recent changes moved the second {{if (shouldCountKeys)}} 
into the upper while loop, which I don't think was intended.

branch: 
[https://github.com/krummas/cassandra/blob/f207720a45c9106cfbdd4e8ab8f34283c58cba52/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L741-L756]
 

vs. 

Trunk: 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L741-L757]

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-16 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477317#comment-16477317
 ] 

Jordan West commented on CASSANDRA-14417:
-

+1

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Attachment: 14449.patch

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14449:
---

 Summary: support cluster backends other than ccm when running 
dtests
 Key: CASSANDRA-14449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
 Project: Cassandra
  Issue Type: Improvement
  Components: Testing
Reporter: Jordan West
Assignee: Jordan West


While ccm is a great orchestration tool to run Cassandra clusters locally, it 
may be desirable to run dtests against clusters running remotely, which may be 
orchestrated by some tool other than ccm. 

Dtest is heavily tied to CCM but with a few minor changes its possible to 
support plugging in other backends that maintain a similar (duck-typed) 
interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478227#comment-16478227
 ] 

Jordan West commented on CASSANDRA-14449:
-

Patch attached and is also available at 
[https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a.]
 

 

Cassandra branched wired up to use the above changes: 
[https://github.com/jrwest/cassandra/tree/pluggable-dtest]

Test runs with the above changes: 
https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Flags: Patch

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Attachment: (was: 14449.patch)

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Attachment: 14449.patch
Status: Patch Available  (was: Open)

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Flags:   (was: Patch)

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-16 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478227#comment-16478227
 ] 

Jordan West edited comment on CASSANDRA-14449 at 5/16/18 11:18 PM:
---

Patch attached and is also available at 
[https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a|https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a]

 

Cassandra branched wired up to use the above changes: 
[https://github.com/jrwest/cassandra/tree/pluggable-dtest]

Test runs with the above changes: 
[https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest]


was (Author: jrwest):
Patch attached and is also available at 
[https://github.com/jrwest/cassandra-dtest/commit/4d9492f87964ed1a6e981431af8f086c651eb07a.]
 

 

Cassandra branched wired up to use the above changes: 
[https://github.com/jrwest/cassandra/tree/pluggable-dtest]

Test runs with the above changes: 
https://circleci.com/gh/jrwest/cassandra/tree/pluggable-dtest

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14443) Improvements for running dtests

2018-05-23 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14443:

Reviewer: Jordan West

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests

2018-05-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988
 ] 

Jordan West edited comment on CASSANDRA-14443 at 5/23/18 8:37 PM:
--

[~KurtG] these changes look like a good improvement, especially not running the 
regular test suite twice. A few comments below:
 * How about exiting instead of logging a warning when insufficient resources 
are available for running resource intensive tests (conftest.py#L460), unless 
{{--force-resource-intensive-tests}} is set? Behaviour would be unchanged 
otherwise and users who want to force run only the resource intensive tests on 
insufficient hardware can use {{-force-resource-intensive-tests 
--resource-intensive-tests-only}}.
 * run_dtests.py: some things I noticed that may be worth cleaning up while 
we’re here
 1. original_raw_cmd_args is only used in one place now. Consider removing it. 
 2. comment on line 120 is stale


was (Author: jrwest):
[~KurtG] these changes look like a good improvement, especially not running the 
regular test suite twice. A few comments below:
 * How about exiting instead of logging a warning when insufficient resources 
are available for running resource intensive tests (conftest.py#L460), unless 
{{--force-resource-intensive-tests-}} is set? Behaviour would be unchanged 
otherwise and users who want to force run only the resource intensive tests on 
insufficient hardware can use {{-force-resource-intensive-tests 
--resource-intensive-tests-only}}.
 * run_dtests.py: some things I noticed that may be worth cleaning up while 
we’re here
 1. original_raw_cmd_args is only used in one place now. Consider removing it. 
 2. comment on line 120 is stale

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests

2018-05-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988
 ] 

Jordan West edited comment on CASSANDRA-14443 at 5/23/18 8:36 PM:
--

[~KurtG] these changes look like a good improvement, especially not running the 
regular test suite twice. A few comments below:
 * How about exiting instead of logging a warning when insufficient resources 
are available for running resource intensive tests (conftest.py#L460), unless 
{{--force-resource-intensive-tests-}} is set? Behaviour would be unchanged 
otherwise and users who want to force run only the resource intensive tests on 
insufficient hardware can use {{-force-resource-intensive-tests 
--resource-intensive-tests-only}}.
 * run_dtests.py: some things I noticed that may be worth cleaning up while 
we’re here
 1. original_raw_cmd_args is only used in one place now. Consider removing it. 
 2. comment on line 120 is stale


was (Author: jrwest):
[~KurtG] these changes look like a good improvement, especially not running the 
test regular suite twice. A few comments below:
 * How about exiting instead of logging a warning when insufficient resources 
are available for running resource intensive tests (conftest.py#L460), unless 
{{--force-resource-intensive-tests}} is set? Behaviour would be unchanged 
otherwise and users who want to force run only the resource intensive tests on 
insufficient hardware can use {{--force-resource-intensive-tests 
--resource-intensive-tests-only}}.
 * run_dtests.py: some things I noticed that may be worth cleaning up while 
we’re here
 1. original_raw_cmd_args is only used in one place now. Consider removing it. 
 2. comment on line 120 is stale

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14443) Improvements for running dtests

2018-05-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487988#comment-16487988
 ] 

Jordan West commented on CASSANDRA-14443:
-

[~KurtG] these changes look like a good improvement, especially not running the 
test regular suite twice. A few comments below:
 * How about exiting instead of logging a warning when insufficient resources 
are available for running resource intensive tests (conftest.py#L460), unless 
{{--force-resource-intensive-tests}} is set? Behaviour would be unchanged 
otherwise and users who want to force run only the resource intensive tests on 
insufficient hardware can use {{--force-resource-intensive-tests 
--resource-intensive-tests-only}}.
 * run_dtests.py: some things I noticed that may be worth cleaning up while 
we’re here
 1. original_raw_cmd_args is only used in one place now. Consider removing it. 
 2. comment on line 120 is stale

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14449) support cluster backends other than ccm when running dtests

2018-05-18 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14449:

Reviewer: Ariel Weisberg

> support cluster backends other than ccm when running dtests
> ---
>
> Key: CASSANDRA-14449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14449
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
> Attachments: 14449.patch
>
>
> While ccm is a great orchestration tool to run Cassandra clusters locally, it 
> may be desirable to run dtests against clusters running remotely, which may 
> be orchestrated by some tool other than ccm. 
> Dtest is heavily tied to CCM but with a few minor changes its possible to 
> support plugging in other backends that maintain a similar (duck-typed) 
> interface. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-15 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600
 ] 

Jordan West edited comment on CASSANDRA-14417 at 5/16/18 3:59 AM:
--

* The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug 
(because the message was removed and the exception is logged as debug). I see 
it changed when extended=true. Verifier#L217.
 * While cleaning things up, should 
{{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use 
{{KeyIterator}} as well?
 * EDIT: Also, just noticed the failing dtest. Seems to be in an area 
related-ish to these changes but I am not familiar enough with it yet to know 
if its related or just a flaky test. (EDIT: it failed here on a branch without 
this change https://circleci.com/gh/jrwest/cassandra/86#tests/containers/66)


was (Author: jrwest):
* The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug 
(because the message was removed and the exception is logged as debug). I see 
it changed when extended=true. Verifier#L217.
 * While cleaning things up, should 
{{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use 
{{KeyIterator}} as well?
 * EDIT: Also, just noticed the failing dtest. Seems to be in an area 
related-ish to these changes but I am not familiar enough with it yet to know 
if its related or just a flaky test. 

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-15 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600
 ] 

Jordan West commented on CASSANDRA-14417:
-

* The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug 
(because the message was removed and the exception is logged as debug). I see 
it changed when extended=true. Verifier#L217.
 * While cleaning things up, should 
{{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use 
{{KeyIterator}} as well?

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14417) nodetool import cleanup/fixes

2018-05-15 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476600#comment-16476600
 ] 

Jordan West edited comment on CASSANDRA-14417 at 5/15/18 11:19 PM:
---

* The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug 
(because the message was removed and the exception is logged as debug). I see 
it changed when extended=true. Verifier#L217.
 * While cleaning things up, should 
{{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use 
{{KeyIterator}} as well?
 * EDIT: Also, just noticed the failing dtest. Seems to be in an area 
related-ish to these changes but I am not familiar enough with it yet to know 
if its related or just a flaky test. 


was (Author: jrwest):
* The output when {{extended=false}}, {{checkOwnsTokens=true}} is still debug 
(because the message was removed and the exception is logged as debug). I see 
it changed when extended=true. Verifier#L217.
 * While cleaning things up, should 
{{ColumnFamilyStore#findBestDiskAndInvalidateCaches}} be refactored to use 
{{KeyIterator}} as well?

> nodetool import cleanup/fixes
> -
>
> Key: CASSANDRA-14417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14417
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> * We shouldn't expose importNewSSTables in both StorageServiceMBean and 
> CFSMbean
> * Allow a quicker token check without doing an extended verify
> * Introduce an ImportOptions class to avoid passing in 100 booleans in 
> importNewSSTables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-11 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508955#comment-16508955
 ] 

Jordan West commented on CASSANDRA-14499:
-

Nothing would disallow truncate, although if one node is at quota, is dropping 
all data what is desired? In some use-cases perhaps. Since deletes temporarily 
inflate storage use, for a node level quota I don't think they should be 
allowed (for a keyspace-level quota that would be different perhaps). The 
client also can't be expected to know exactly which keys live on the node(s) 
that are at quota which makes remediation by delete less viable. The most 
likely remediations are adding more nodes or truncation. A correct 
implementation would prevent neither of these. 

I agree that this could/should live in the management process

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506754#comment-16506754
 ] 

Jordan West commented on CASSANDRA-14499:
-

{quote}disabling gosspi alone is insufficient, also need to disable native
{quote}
Agreed. I hadn't updated the description to reflect it but what I am working on 
does this as well. 
{quote}still not sure I buy the argument that it’s wrong to serve reads in this 
case - it may be true that some table is getting out of sync, but that doesn’t 
mean every table is,
{quote}
I agree it depends on the workload for each specific dataset but since we can't 
know which we have we have to assume it could get really out of sync. 
{quote}and we already have a mechanism to deal with nodes that can serve reads 
but not writes (speculating on the read repair).
{quote}
Even if we speculate we still attempt it. That work will always be for naught 
and being at quota is likely a prolonged state (the ways out of it take a 
while).
{quote}If you don’t serve reads either, than any GC pause will be guaranteed to 
impact client request latency as we can’t soeculate around it in the common 
rf=3 case.
{quote}
This is true. But thats almost the same as losing a node because its disk has 
been filled up completely. If we have one unhealthy node we are another 
unhealthy node away from unavailability in the rf=3/quorum case. 

That said, I'll consider the reads more over the weekend. Its a valid concern. 

 

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506746#comment-16506746
 ] 

Jordan West commented on CASSANDRA-14499:
-

The other reason the OS level wouldn't work is we are trying to track *live* 
data, which the OS can't tell the difference between.

Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some 
offline. Since the node can only get more and more out of sync while not taking 
write traffic and can't participate in (read) repair until the amount of 
storage used is below quota, we thought it better to disable both reads and 
writes. Less-blocking and speculative read repair makes us more available in 
this case (as it should).

Disabling gossip is a quick route to disabling reads/writes. Is it the best 
approach to doing so? I'm not 100%. My concern is for how the operator gets 
back to a healthy state once a quota is reached on a node. They have a few 
options: migrate data to a bigger node, compaction catches up and deletes data, 
quota is raised so its not met anymore, node(s) are added to take storage 
responsibility away from the node, or data is forcefully deleted from the node. 
We need to ensure we don't prevent those operations from taking place. I've 
been discussing this with [~jasobrown] offline as well. 

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14499) node-level disk quota

2018-06-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506746#comment-16506746
 ] 

Jordan West edited comment on CASSANDRA-14499 at 6/9/18 12:36 AM:
--

The other reason the OS level wouldn't work is we are trying to track *live* 
data, which the OS can't tell the difference between. EDIT: also to clarify, 
the goal here isn't to implement a perfect quota. There will be some room for 
error where the quota can be exceeded. The goal is to the mark the node 
unhealthy when it reaches this level and to have enough headroom for compaction 
or other operations to get it to a healthy state. 

Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some 
offline. Since the node can only get more and more out of sync while not taking 
write traffic and can't participate in (read) repair until the amount of 
storage used is below quota, we thought it better to disable both reads and 
writes. Less-blocking and speculative read repair makes us more available in 
this case (as it should).

Disabling gossip is a quick route to disabling reads/writes. Is it the best 
approach to doing so? I'm not 100%. My concern is for how the operator gets 
back to a healthy state once a quota is reached on a node. They have a few 
options: migrate data to a bigger node, compaction catches up and deletes data, 
quota is raised so its not met anymore, node(s) are added to take storage 
responsibility away from the node, or data is forcefully deleted from the node. 
We need to ensure we don't prevent those operations from taking place. I've 
been discussing this with [~jasobrown] offline as well. 


was (Author: jrwest):
The other reason the OS level wouldn't work is we are trying to track *live* 
data, which the OS can't tell the difference between.

Regarding taking reads, [~jasobrown], [~krummas], and I discussed this some 
offline. Since the node can only get more and more out of sync while not taking 
write traffic and can't participate in (read) repair until the amount of 
storage used is below quota, we thought it better to disable both reads and 
writes. Less-blocking and speculative read repair makes us more available in 
this case (as it should).

Disabling gossip is a quick route to disabling reads/writes. Is it the best 
approach to doing so? I'm not 100%. My concern is for how the operator gets 
back to a healthy state once a quota is reached on a node. They have a few 
options: migrate data to a bigger node, compaction catches up and deletes data, 
quota is raised so its not met anymore, node(s) are added to take storage 
responsibility away from the node, or data is forcefully deleted from the node. 
We need to ensure we don't prevent those operations from taking place. I've 
been discussing this with [~jasobrown] offline as well. 

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506533#comment-16506533
 ] 

Jordan West commented on CASSANDRA-14499:
-

[~jeromatron] I understand those concerns. This would be opt-in for folks who 
wanted automatic action taken and any such action should take care to not cause 
the node to flap, for example. One use case where we see this as valuable is 
QA/perf/test clusters that may not have the full monitoring setup but need to 
be protected from errant clients filling up disks to a point where worse things 
happen. The warning system can be accomplished today with monitoring and 
alerting on the same metrics.

> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker

2018-06-18 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14529:
---

 Summary: nodetool import row cache invalidation races with adding 
sstables to tracker
 Key: CASSANDRA-14529
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14529
 Project: Cassandra
  Issue Type: Bug
Reporter: Jordan West
Assignee: Jordan West


CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, 
which [occurs before adding new sstables to the 
tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178].
 Stale reads will result after a read is interleaved with the read row's 
invalidation and adding the containing file to the tracker.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14529) nodetool import row cache invalidation races with adding sstables to tracker

2018-06-18 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14529:

Status: Patch Available  (was: Open)

Made the cache invalidation run after the files are added to the tracker. This 
is similar to 
[streaming|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/streaming/CassandraStreamReceiver.java#L207-L210].
 There is still a race condition but the worst case is only invalidation of a 
cached copy of the newly added data. 

Branch: [https://github.com/jrwest/cassandra/commits/14529-trunk]
 Tests: [https://circleci.com/gh/jrwest/cassandra/tree/14529-trunk]

> nodetool import row cache invalidation races with adding sstables to tracker
> 
>
> Key: CASSANDRA-14529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14529
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> CASSANDRA-6719 introduced {{nodetool import}} with row cache invalidation, 
> which [occurs before adding new sstables to the 
> tracker|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SSTableImporter.java#L137-L178].
>  Stale reads will result after a read is interleaved with the read row's 
> invalidation and adding the containing file to the tracker.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14499) node-level disk quota

2018-06-11 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508484#comment-16508484
 ] 

Jordan West commented on CASSANDRA-14499:
-

Since the goal isn't to strictly enforce the quota (its ok if its violated but 
once noticed action should be taken) the code isn't invasive. Its a small 
amount of new code with the only change being to schedule the check on optional 
tasks. That being said, if the concern is complexity, one potential place for 
this (and I think it may be better home regardless) is 
[CASSANDRA-14395|https://issues.apache.org/jira/browse/CASSANDRA-14395].

While this may seem like a small bandaid, and there are cases where multiple 
nodes can go down at once, it is exactly meant to give some headroom. This 
headroom makes it considerably easier to get the cluster into a healthy state 
again. 




> node-level disk quota
> -
>
> Key: CASSANDRA-14499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
>
> Operators should be able to specify, via YAML, the amount of usable disk 
> space on a node as a percentage of the total available or as an absolute 
> value. If both are specified, the absolute value should take precedence. This 
> allows operators to reserve space available to the database for background 
> tasks -- primarily compaction. When a node reaches its quota, gossip should 
> be disabled to prevent it taking further writes (which would increase the 
> amount of data stored), being involved in reads (which are likely to be more 
> inconsistent over time), or participating in repair (which may increase the 
> amount of space used on the machine). The node re-enables gossip when the 
> amount of data it stores is below the quota.   
> The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
> reserves some amount of space on each drive that is not usable by the 
> database.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex

2018-05-29 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14207:

   Attachment: 14207-example-test.patch
Reproduced In: 3.11.1, 3.11.0  (was: 3.11.0, 3.11.1)
   Status: Patch Available  (was: Open)

I've worked up a patch for this that applies to 3.11: 
[https://github.com/jrwest/cassandra/commits/14207-3.11.] The patch applies 
cleanly to trunk last I tested. Ran tests on 
[3.11|https://circleci.com/gh/jrwest/cassandra/tree/14207-3%2E11] and on 
[trunk|https://circleci.com/gh/jrwest/cassandra/tree/14207-trunk]. Also, 
attached is a test that I don't think is worth merging (its too contrived) but 
is illustrative of the scenario that causes a double release to occur. 

[~ifesdjeen] would you be able to take a look? 

> Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper 
> Reference Counting of SSTableIndex
> 
>
> Key: CASSANDRA-14207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14207
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
> Attachments: 14207-example-test.patch, 
> sasi-invalid-reference-count.rtf
>
>
> A race between e.g. Index Redistribution and Compaction can cause the compare 
> and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to 
> fail, leading to recreation of the view and improper reference counting of an 
> {{SSTableIndex}}. This is because the side-effects (decrementing the 
> reference count via {{SStableIndex#release}}) occur regardless of if the view 
> is promoted to be the active view.  
> Code: 
> https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78
>  
> Attached logs and debug output show case where index redistribution and 
> compaction race. This case was generated using the test provided in 
> https://issues.apache.org/jira/browse/CASSANDRA-14055



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex

2018-05-29 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14207:

Description: 
A race between e.g. index redistribution and compaction (or memtable flushes 
and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in 
{{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view and 
improper reference counting of an {{SSTableIndex}}. This is because the 
side-effects (decrementing the reference count via {{SStableIndex#release}}) 
occur regardless of if the view is promoted to be the active view.  

Code: 
[https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78]
 

Attached logs and debug output show case where index redistribution and 
compaction race. This case was generated using the test provided in 
https://issues.apache.org/jira/browse/CASSANDRA-14055

  was:
A race between e.g. Index Redistribution and Compaction can cause the compare 
and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to 
fail, leading to recreation of the view and improper reference counting of an 
{{SSTableIndex}}. This is because the side-effects (decrementing the reference 
count via {{SStableIndex#release}}) occur regardless of if the view is promoted 
to be the active view.  

Code: 
https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78
 

Attached logs and debug output show case where index redistribution and 
compaction race. This case was generated using the test provided in 
https://issues.apache.org/jira/browse/CASSANDRA-14055


> Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper 
> Reference Counting of SSTableIndex
> 
>
> Key: CASSANDRA-14207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14207
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
> Attachments: 14207-example-test.patch, 
> sasi-invalid-reference-count.rtf
>
>
> A race between e.g. index redistribution and compaction (or memtable flushes 
> and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in 
> {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view 
> and improper reference counting of an {{SSTableIndex}}. This is because the 
> side-effects (decrementing the reference count via {{SStableIndex#release}}) 
> occur regardless of if the view is promoted to be the active view.  
> Code: 
> [https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78]
>  
> Attached logs and debug output show case where index redistribution and 
> compaction race. This case was generated using the test provided in 
> https://issues.apache.org/jira/browse/CASSANDRA-14055



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex

2018-05-29 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494717#comment-16494717
 ] 

Jordan West commented on CASSANDRA-14207:
-

[~jjirsa] generally I agree but the changes move the code that has potential to 
regress out of the code covered by the test and the test is a potential 
interleaving based on how another part of code is currently written. I think it 
serves better as an illustration only, in this specific case. 

> Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper 
> Reference Counting of SSTableIndex
> 
>
> Key: CASSANDRA-14207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14207
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Major
> Attachments: 14207-example-test.patch, 
> sasi-invalid-reference-count.rtf
>
>
> A race between e.g. index redistribution and compaction (or memtable flushes 
> and compaction) can cause the compare and swap of a new {{sasi.conf.View}} in 
> {{sasi.conf.DataTracker#update}} to fail, leading to recreation of the view 
> and improper reference counting of an {{SSTableIndex}}. This is because the 
> side-effects (decrementing the reference count via {{SStableIndex#release}}) 
> occur regardless of if the view is promoted to be the active view.  
> Code: 
> [https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78]
>  
> Attached logs and debug output show case where index redistribution and 
> compaction race. This case was generated using the test provided in 
> https://issues.apache.org/jira/browse/CASSANDRA-14055



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14479) Secondary Indexes Can "Leak" Records If Insert/Partition Delete Occur Between Flushes

2018-05-30 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14479:
---

 Summary: Secondary Indexes Can "Leak" Records If Insert/Partition 
Delete Occur Between Flushes
 Key: CASSANDRA-14479
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14479
 Project: Cassandra
  Issue Type: Bug
  Components: Secondary Indexes
Reporter: Jordan West
 Attachments: 2i-leak-test.patch

When an insert of an indexed column is followed rapidly (within the same 
memtable) by a delete of an entire partition, the index table for the column 
will continue to store the record for the inserted value and no tombstone will 
ever be written. This occurs because the index isn't updated after the delete 
but before the flush. The value is lost after flush, so subsequent compactions 
can't issue a delete for the primary key in the index column. 

The attached test reproduces the described issue. The test fails to assert that 
the index cfs is empty. The subsequent assertion that there are no live 
sstables would also fail. Looking on disk with sstabledump after running this 
test shows the value remaining. 

Originally reported on the mailing list by Roman Bielik:

Create table with LeveledCompactionStrategy;
'tombstone_compaction_interval': 60; gc_grace_seconds=60
There are two indexed columns for comparison: column1, column2
Insert keys \{1..x} with random values in column1 & column2
Delete \{key:column2}     (but not column1)
Delete \{key}
Repeat n-times from the inserts
Wait 1 minute
nodetool flush
nodetool compact (sometimes compact  
nodetool cfstats

What I observe is, that the data table is empty, column2 index table is
also empty and column1 index table has non-zero (leaked) "space used" and
"estimated rows".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16

2018-06-01 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497633#comment-16497633
 ] 

Jordan West commented on CASSANDRA-14468:
-

Can take a look next week

> "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
> -
>
> Key: CASSANDRA-14468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14468
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wade Simmons
>Priority: Major
>
> I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this 
> error:
> {code}
> org.apache.cassandra.exceptions.ConfigurationException: Unable to parse 
> targets for index idx_foo ("666f6f")
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) 
> [apache-cassandra-3.0.16.jar:3.0.16]
> {code}
> It looks like this might be related to CASSANDRA-14104 that was just added to 
> 3.0.16 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14442) Let nodetool import take a list of directories

2018-05-31 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497552#comment-16497552
 ] 

Jordan West commented on CASSANDRA-14442:
-

[~krummas] LGTM overall. One thing I wanted to check: is it ok that callers of 
importNewSSTables can now run concurrently with callers of 
CFS#runWithCompactionsDisabled (callers like truncate and clearUnsafe)? 

Some minor things:
* Remove the whitespace only change in Tracker
* Rename first argument of CFS#importNewSSTables to srcPaths
* Consider moving {{SSTableImporter#moveAndOpenSSTable}} to be a static method 
on SSTable, maybe {{renameAndOpen}} (it may be useful for future uses/tests and 
isn’t specific to {{SSTableImporter}})
* Thanks for adding the new dtests. Should they be marked since 4.0?

> Let nodetool import take a list of directories
> --
>
> Key: CASSANDRA-14442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14442
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> It should be possible to load sstables from several input directories when 
> running nodetool import. Directories that failed to import should be output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14451) Infinity ms Commit Log Sync

2018-06-04 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14451:

Reviewer: Jordan West

> Infinity ms Commit Log Sync
> ---
>
> Key: CASSANDRA-14451
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14451
> Project: Cassandra
>  Issue Type: Bug
> Environment: 3.11.2 - 2 DC
>Reporter: Harry Hough
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Its giving commit log sync warnings where there were apparently zero syncs 
> and therefore gives "Infinityms" as the average duration
> {code:java}
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 74.40ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 198.69ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 264.11ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 
> NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, 
> average duration of 17.56ms, 1 have exceeded the configured commit interval 
> by an average of 173.66ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14451) Infinity ms Commit Log Sync

2018-06-04 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501213#comment-16501213
 ] 

Jordan West commented on CASSANDRA-14451:
-

[~jasobrown] change LGTM. A few questions and minor comments:
 * Are the ArchiveCommitLog dtest failures expected on the 3.0 branch? 
 * The “sleep any time we have left” comment would be more appropriate above 
the assignment of {{wakeUpAt}}. 
 * Mark {{maybeLogFlushLag}} and {{getTotalSyncDuration}} as 
{{@VisibleForTesting}}
 * Just wanted to check that the change in behavior of updating 
{{totalSyncDuration}} is intentional. It makes sense to me that we only 
increment it if a sync actually occurs but that wasn’t the case before. 
 * Is there are reason you opted for the “excessTimeToFlush” approach in 3.0 
but the “maxFlushTimestamp” approach on 3.11 and trunk? The only difference I 
see is the unit of time. 

> Infinity ms Commit Log Sync
> ---
>
> Key: CASSANDRA-14451
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14451
> Project: Cassandra
>  Issue Type: Bug
> Environment: 3.11.2 - 2 DC
>Reporter: Harry Hough
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Its giving commit log sync warnings where there were apparently zero syncs 
> and therefore gives "Infinityms" as the average duration
> {code:java}
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 74.40ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 198.69ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 264.11ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 
> NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, 
> average duration of 17.56ms, 1 have exceeded the configured commit interval 
> by an average of 173.66ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14451) Infinity ms Commit Log Sync

2018-06-05 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501923#comment-16501923
 ] 

Jordan West commented on CASSANDRA-14451:
-

{quote} I left it where it was previously located, but can move it to the more 
logical spot.
{quote}
I don't find it very useful where it is now. Would vote to move it or remove it 
(the code is pretty clear).
{quote}I wanted to keep the logic as close to the original as possible, since 
3.0 is far along in it's age. I suppose it doesn't matter that much, though, 
and can change if you think it's worthwhile. wdyt?
{quote}
>From the review perspective it was just a second implementation to check for 
>correctness and it seems like either implementation could be used. Would vote 
>for them to be the same but fine as is if you prefer.

 

Otherwise, +1

> Infinity ms Commit Log Sync
> ---
>
> Key: CASSANDRA-14451
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14451
> Project: Cassandra
>  Issue Type: Bug
> Environment: 3.11.2 - 2 DC
>Reporter: Harry Hough
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Its giving commit log sync warnings where there were apparently zero syncs 
> and therefore gives "Infinityms" as the average duration
> {code:java}
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 74.40ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 198.69ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 
> NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with 
> average duration of Infinityms, 1 have exceeded the configured commit 
> interval by an average of 264.11ms 
> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 
> NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, 
> average duration of 17.56ms, 1 have exceeded the configured commit interval 
> by an average of 173.66ms{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14499) node-level disk quota

2018-06-05 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14499:
---

 Summary: node-level disk quota
 Key: CASSANDRA-14499
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14499
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jordan West
Assignee: Jordan West


Operators should be able to specify, via YAML, the amount of usable disk space 
on a node as a percentage of the total available or as an absolute value. If 
both are specified, the absolute value should take precedence. This allows 
operators to reserve space available to the database for background tasks -- 
primarily compaction. When a node reaches its quota, gossip should be disabled 
to prevent it taking further writes (which would increase the amount of data 
stored), being involved in reads (which are likely to be more inconsistent over 
time), or participating in repair (which may increase the amount of space used 
on the machine). The node re-enables gossip when the amount of data it stores 
is below the quota.   

The proposed option differs from {{min_free_space_per_drive_in_mb}}, which 
reserves some amount of space on each drive that is not usable by the database. 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14442) Let nodetool import take a list of directories

2018-06-01 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498265#comment-16498265
 ] 

Jordan West commented on CASSANDRA-14442:
-

LGTM. I'm +1 as is but one minor suggestion if you feel like including it:

The live SSTable check could be replaced by the following. Its a little more 
succinct and less work (since we do the "contains" check in the iteration 
instead of afterwards):
{code:java}
boolean isLive = cfs.getLiveSSTables().stream().filter(r -> 
r.descriptor.equals(newDescriptor) || 
r.descriptor.equals(oldDescriptor)).findAny().isPresent();
if (isLive) 
{
String message = String.format("Can't move and open a file that is already 
in use in the table %s -> %s", oldDescriptor, newDescriptor);
logger.error(message);
throw new RuntimeException(message);
}
{code}
 

> Let nodetool import take a list of directories
> --
>
> Key: CASSANDRA-14442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14442
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> It should be possible to load sstables from several input directories when 
> running nodetool import. Directories that failed to import should be output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14443) Improvements for running dtests

2018-07-02 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530727#comment-16530727
 ] 

Jordan West edited comment on CASSANDRA-14443 at 7/3/18 2:30 AM:
-

Thanks for the updates [~KurtG].

Some comments from a second pass:
 * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, 
otherwise the logging occurs in the wrong case
 * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not 
upgrade_only}}, similar to L445 for resource intensive flags
 * Is it intentional to use {{print}} instead of the loggers in conftest.py? 
conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to 
use loggers instead of {{print}} on L73.

{quote}TBH I think that the whole resource check can go, and this kind of 
information is more suitable for the documentation. We don't put resource 
limits on Cassandra and I don't think we should do it for the dtests.
{quote}
 

Logging is useful because it reduces the amount of time to find the issue and 
fix it. Failing fast, with an option to override, speeds that up more but I'm 
not strongly for it, if you prefer to leave it as is. 


was (Author: jrwest):
Thanks for the updates [~KurtG].

S}ome comments from a second pass:
 * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, 
otherwise the logging occurs in the wrong case
 * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not 
upgrade_only}}, similar to L445 for resource intensive flags
 * Is it intentional to use {{print}} instead of the loggers in conftest.py? 
conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to 
use loggers instead of {{print}} on L73.

{quote}TBH I think that the whole resource check can go, and this kind of 
information is more suitable for the documentation. We don't put resource 
limits on Cassandra and I don't think we should do it for the dtests.
{quote}
 

Logging is useful because it reduces the amount of time to find the issue and 
fix it. Failing fast, with an option to override, speeds that up more but I'm 
not strongly for it, if you prefer to leave it as is. 

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14443) Improvements for running dtests

2018-07-02 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530727#comment-16530727
 ] 

Jordan West commented on CASSANDRA-14443:
-

Thanks for the updates [~KurtG].

S}ome comments from a second pass:
 * conftest.pyL434 should be {{if not}} instead of {{if sufficient_…}}, 
otherwise the logging occurs in the wrong case
 * Looking at the docs you added, conftest.pyL479 be {{if not upgrade and not 
upgrade_only}}, similar to L445 for resource intensive flags
 * Is it intentional to use {{print}} instead of the loggers in conftest.py? 
conftest.pyL492 may also be verbose. Similarly, run_dtests.py was changed to 
use loggers instead of {{print}} on L73.

{quote}TBH I think that the whole resource check can go, and this kind of 
information is more suitable for the documentation. We don't put resource 
limits on Cassandra and I don't think we should do it for the dtests.
{quote}
 

Logging is useful because it reduces the amount of time to find the issue and 
fix it. Failing fast, with an option to override, speeds that up more but I'm 
not strongly for it, if you prefer to leave it as is. 

> Improvements for running dtests
> ---
>
> Key: CASSANDRA-14443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14443
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Kurt Greaves
>Assignee: Kurt Greaves
>Priority: Major
>  Labels: dtest
>
> We currently hardcode a requirement that you need at least 27gb of memory to 
> run the resource intensive tests. This is rather annoying as there isn't 
> really a strict hardware requirement and tests can run on smaller machines in 
> a lot of cases (especially if you mess around with HEAP). 
> We've already got the command line argument 
> {{--force-resource-intensive-tests}}, we don't need additional restrictions 
> in place to stop people who shouldn't be running the tests from running them.
> We also don't have a way to run _only_ the resource-intensive dtests or 
> _only_ the upgrade tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-01 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Reviewer: Jordan West  (was: Alex Petrov)

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.2
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-01-31 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347781#comment-16347781
 ] 

Jordan West commented on CASSANDRA-14055:
-

Hi [~lboutros],

One of the original authors of SASI here. I've been taking a look at this issue 
and your patch. Using the provided test against the {{cassandra-3.11}} branch 
(fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases 
– two related directly to this issue and one tangentially related. More details 
on those below. With respect to this issue in particular, the three scenarios 
cause the test to fail because {{IndexSummaryManager}} ends up creating a new 
{{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This 
occurs because the {{IndexSummaryManager}} may "update" (re-open) an 
{{SSTableReader}} for an index already in the view. I believe this is unique to 
{{IndexSummaryManager}} and I am able to make your tests pass* without your 
patch by ensuring that there is no overlap between {{oldSStables}} and 
{{newIndexes}} (favoring {{newIndexes}}). Your patch looks to do this as well, 
though the approach is a bit different.

One thing I am curious about in your patch is the {{keepFile}} changes to 
{{SSTableIndex#release}}. Generally, this concerns me because it seems to be 
working around improper reference counting rather than correcting the reference 
counting itself. Also, while using the provided test, I am unable to hit a case 
where the condition {{obsolete.get() || sstableRef.globalCount() == 0}} is 
true. I see the file missing in the {{View}} but not on disk itself. Could you 
elaborate a bit more on the need for this change and your use of the 
{{keepFile}} flag?

The three failure scenarios I see using the provided test are:
h5. 8 keys returned - sequential case

In this scenario, at the time when the query that fails runs, the {{View}} is 
missing the most recently flushed sstable. As mentioned previously, this is 
because the intersection of {{oldSSTables}} and {{newIndexes}} is non-empty. 
This can be fixed* by ensuring nothing in {{newIndexes}} is in {{oldSSTables}}. 
I call this the sequential case because the compaction that occurs during the 
test completes before the index summary redistribution begins to create a new 
{{View}}. This is also addressed by your patch.
h5. 8 keys returned - race case

This scenario is similar to the previous one but has the additional issue of 
triggering improper {{SSTableIndex}} reference counting. From the perspective 
of the provided test, the failure scenario is the same and the fix* is as well. 
The issue occurs because of a race between compaction and index 
redistribution's creation of new {{View}} instances. This causes redistribution 
to create two {{View}} instances, the first of which is thrown away due to a 
failed compare and swap. The problem is the side-effects (calling 
{{SSTableIndex#release}}) have occurred already inside the creation of the 
garbage {{View}}, causing the reference count for the index to drop below 0. I 
see this issue as a separate one from this ticket and will file a separate 
JIRA. It is not fixed by the previously mentioned change and while I haven't 
checked in detail, I don't think the provided patch addresses this either.
h5. 0 keys returned

This scenario is similar to the first but there are three threads involved in 
the race: the compaction, the flushing of the last memtable, and the index 
redistribution. In this case, the end result is an empty {{View}}, which leads 
to no keys being returned since the system thinks there are no indexes to 
search. This is fixed* by what I mentioned previously and occurs because index 
redistribution re-opens both sstables in the original {{View}} instead of just 
one. It is also addressed by your patch. 

 

I am curious if you see any other failure scenarios besides these three and, in 
particular, if you can elaborate on and provide examples of the issues you see 
regarding the files being missing on disk and the need for the {{keepFile}} 
change.
 
\* While this fix makes the provided test pass I am still verifying its correct 
from the reference counting perspective.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.2
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the 

[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-01-31 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347781#comment-16347781
 ] 

Jordan West edited comment on CASSANDRA-14055 at 1/31/18 11:39 PM:
---

Hi [~lboutros],

One of the original authors of SASI here. I've been taking a look at this issue 
and your patch. Using the provided test against the {{cassandra-3.11}} branch 
(fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases 
– two related directly to this issue and one tangentially related. More details 
on those below. With respect to this issue in particular, the three scenarios 
cause the test to fail because {{IndexSummaryManager}} ends up creating a new 
{{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This 
occurs because the {{IndexSummaryManager}} may "update" (re-open) an 
{{SSTableReader}} for an index already in the view. I believe this is unique to 
{{IndexSummaryManager}} and I am able to make your tests pass* without your 
patch by ensuring that there is no overlap between {{oldSStables}} and 
{{newIndexes}} (favoring {{newIndexes}}). Your patch looks to do this as well, 
though the approach is a bit different.

One thing I am curious about in your patch is the {{keepFile}} changes to 
{{SSTableIndex#release}}. Generally, this concerns me because it seems to be 
working around improper reference counting rather than correcting the reference 
counting itself. Also, while using the provided test, I am unable to hit a case 
where the condition {{obsolete.get() || sstableRef.globalCount() == 0}} is 
true. I see the file missing in the {{View}} but not on disk itself. Could you 
elaborate a bit more on the need for this change and your use of the 
{{keepFile}} flag?

The three failure scenarios I see using the provided test are:
h5. 8 keys returned - sequential case

In this scenario, at the time when the query that fails runs, the {{View}} is 
missing the most recently flushed sstable. As mentioned previously, this is 
because the intersection of {{oldSSTables}} and {{newIndexes}} is non-empty. 
This can be fixed* by ensuring nothing in {{newIndexes}} is in {{oldSSTables}}. 
I call this the sequential case because the compaction that occurs during the 
test completes before the index summary redistribution begins to create a new 
{{View}}. This is also addressed by your patch.
h5. 8 keys returned - race case

This scenario is similar to the previous one but has the additional issue of 
triggering improper {{SSTableIndex}} reference counting. From the perspective 
of the provided test, the failure scenario is the same and the fix* is as well. 
The issue occurs because of a race between compaction and index 
redistribution's creation of new {{View}} instances. This causes redistribution 
to create two {{View}} instances, the first of which is thrown away due to a 
failed compare and swap. The problem is the side-effects (calling 
{{SSTableIndex#release}}) have occurred already inside the creation of the 
garbage {{View}}, causing the reference count for the index to drop below 0. I 
see this issue as a separate one from this ticket and have filed 
[CASSANDRA-14207|https://issues.apache.org/jira/browse/CASSANDRA-14207]. It is 
not fixed by the previously mentioned change and while I haven't checked in 
detail, I don't think the provided patch addresses this either.
h5. 0 keys returned

This scenario is similar to the first but there are three threads involved in 
the race: the compaction, the flushing of the last memtable, and the index 
redistribution. In this case, the end result is an empty {{View}}, which leads 
to no keys being returned since the system thinks there are no indexes to 
search. This is fixed* by what I mentioned previously and occurs because index 
redistribution re-opens both sstables in the original {{View}} instead of just 
one. It is also addressed by your patch. 

 

I am curious if you see any other failure scenarios besides these three and, in 
particular, if you can elaborate on and provide examples of the issues you see 
regarding the files being missing on disk and the need for the {{keepFile}} 
change.

\* While this fix makes the provided test pass I am still verifying its correct 
from the reference counting perspective.


was (Author: jrwest):
Hi [~lboutros],

One of the original authors of SASI here. I've been taking a look at this issue 
and your patch. Using the provided test against the {{cassandra-3.11}} branch 
(fc3357a00e2b6e56d399f07c5b81a82780c1e143), I see three different failure cases 
– two related directly to this issue and one tangentially related. More details 
on those below. With respect to this issue in particular, the three scenarios 
cause the test to fail because {{IndexSummaryManager}} ends up creating a new 
{{View}} where {{oldSSTables}} and {{newIndexes}} have overlapping values. This 

[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-01-31 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Attachment: CASSANDRA-14055-jrwest.patch

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.2
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14207) Failed Compare and Swap in SASI's DataTracker#update Can Lead to Improper Reference Counting of SSTableIndex

2018-01-31 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14207:
---

 Summary: Failed Compare and Swap in SASI's DataTracker#update Can 
Lead to Improper Reference Counting of SSTableIndex
 Key: CASSANDRA-14207
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14207
 Project: Cassandra
  Issue Type: Bug
  Components: sasi
Reporter: Jordan West
Assignee: Jordan West
 Attachments: sasi-invalid-reference-count.rtf

A race between e.g. Index Redistribution and Compaction can cause the compare 
and swap of a new {{sasi.conf.View}} in {{sasi.conf.DataTracker#update}} to 
fail, leading to recreation of the view and improper reference counting of an 
{{SSTableIndex}}. This is because the side-effects (decrementing the reference 
count via {{SStableIndex#release}}) occur regardless of if the view is promoted 
to be the active view.  

Code: 
https://github.com/apache/cassandra/blob/cassandra-3.11.1/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java#L72-L78
 

Attached logs and debug output show case where index redistribution and 
compaction race. This case was generated using the test provided in 
https://issues.apache.org/jira/browse/CASSANDRA-14055



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-01-31 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Attachment: (was: CASSANDRA-14055-jrwest.patch)

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.2
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-09 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358877#comment-16358877
 ] 

Jordan West commented on CASSANDRA-14055:
-

Hi [~lboutros],

My apologies for the delay. I am waiting on internal review of my version of 
the patch so you can take a look. I believe this patch accomplishes the same 
thing but with less changes and doesn't affect the reference counting in 
SSTableIndex. I hope to have it posted for your review and testing early next 
week.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.2
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-12 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Attachment: CASSANDRA-14055-jrwest.patch

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-12 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361610#comment-16361610
 ] 

Jordan West commented on CASSANDRA-14055:
-

Hi [~lboutros],

Attached my patch for your review and testing. If you could verify this does 
the right thing in your environments that would be especially helpful since I 
have been unable to replicate the deleted file issue – I only see the sstables 
removed from the SASI View.

The gist of the patch is to ensure the intersection of {{oldSStables}} and 
{{newIndexes}} is always empty. Your patch was doing the same by not checking 
{{oldSSTables}} in the second for-loop, but this approach doesn't require the 
changes to {{SSTableIndex#release}}.

I re-used your test but removed the version that runs with the data entirely 
in-memory since that won't be affected by index redistribution.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x
>
> Attachments: CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-14 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364440#comment-16364440
 ] 

Jordan West edited comment on CASSANDRA-14055 at 2/14/18 5:22 PM:
--

[~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is 
accessed/stored was changed and the test will need to be modified as a result. 
Will post a separate patch for trunk. 

[~lboutros], In my testing, the primary issue I saw was that files were removed 
from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with 
sequence numbers 1-4 & 6) and during the test a compaction typically happens 
(that generates a sstable with generation 5 from sstables 1-4). The final SASI 
{{View}} when the queries are performed should contain either (1-4, 6) or (5, 
6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} 
ends up containing only sstable 5 or by returning 0 keys instead of 10 when the 
SASI {{View}} ends up empty. 

The issue occurs when index redistribution completes. Depending on the 
interleaving* of events (the memtable flush, compaction, and redistribution), 
redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This 
results in an {{SSTableListChangedNotification}}, which in turn results in the 
creation of a new {{View}},  where {{added=[6]}} (or {{added=[5,6]}}) and 
{{removed=[6]}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming 
these two sets were disjoint, which is why any reader in {{oldSSTables}} caused 
the index to be closed. This is incorrect in both cases because sstables 5 and 
6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 
& 10). 

Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 
via their SSTableIndex instance but we’ve created a second reference and one 
needs to be closed. This is ensured by the 
{{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are 
still indeed calling {{#release()}} on one instance). As I am writing this, 
however, I am realizing we want to keep a reference to the newer index, which 
references the newer SSTable instance and my patch does the opposite — keeping 
the old instance. I will post an updated patch along with my trunk patch for 
internal review, but the gist is to change the order we iterate over the old 
view and new indexes to favor new index instances.

NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207 above

*I've found a few other interleavings by using another machine, but the general 
issue is the same.


was (Author: jrwest):
[~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is 
accessed/stored was changed and the test will need to be modified as a result. 
Will post a separate patch for trunk. 

[~lboutros], In my testing, the primary issue I saw was that files were removed 
from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with 
sequence numbers 1-4 & 6) and during the test a compaction typically happens 
(that generates a sstable with generation 5 from sstables 1-4). The final SASI 
{{View}} when the queries are performed should contain either (1-4, 6) or (5, 
6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} 
ends up containing only sstable 5 or by returning 0 keys instead of 10 when the 
SASI {{View}} ends up empty. 

The issue occurs when index redistribution completes. Depending on the 
interleaving* of events (the memtable flush, compaction, and redistribution), 
redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This 
results in an {{SSTableListChangedNotification}}, which in turn results in the 
creation of a new {{View}},  where {{added=6}} (or {{added=[5,6]}}) and 
{{removed=6}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming 
these two sets were disjoint, which is why any reader in {{oldSSTables}} caused 
the index to be closed. This is incorrect in both cases because sstables 5 and 
6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 
& 10). 

Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 
via their SSTableIndex instance but we’ve created a second reference and one 
needs to be closed. This is ensured by the 
{{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are 
still indeed calling {{#release()}} on one instance). As I am writing this, 
however, I am realizing we want to keep a reference to the newer index, which 
references the newer SSTable instance and my patch does the opposite — keeping 
the old instance. I will post an updated patch along with my trunk patch for 
internal review, but the gist is to change the order we iterate over the old 
view and new indexes to favor new index instances.

NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207 

[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-14 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364440#comment-16364440
 ] 

Jordan West commented on CASSANDRA-14055:
-

[~jasobrown], sorry for the trunk issues. The way {{TableMetadata}} is 
accessed/stored was changed and the test will need to be modified as a result. 
Will post a separate patch for trunk. 

[~lboutros], In my testing, the primary issue I saw was that files were removed 
from the SASI {{View}} that shouldn't be. The test writes 5 sstables (with 
sequence numbers 1-4 & 6) and during the test a compaction typically happens 
(that generates a sstable with generation 5 from sstables 1-4). The final SASI 
{{View}} when the queries are performed should contain either (1-4, 6) or (5, 
6)*. The test fails by returning 8 keys instead of 10 when the SASI {{View}} 
ends up containing only sstable 5 or by returning 0 keys instead of 10 when the 
SASI {{View}} ends up empty. 

The issue occurs when index redistribution completes. Depending on the 
interleaving* of events (the memtable flush, compaction, and redistribution), 
redistribution re-opens sstable 6, and sometimes re-opens sstable 5. This 
results in an {{SSTableListChangedNotification}}, which in turn results in the 
creation of a new {{View}},  where {{added=6}} (or {{added=[5,6]}}) and 
{{removed=6}} (or {{removed=[5,6]}}). The SASI {{View}} was written assuming 
these two sets were disjoint, which is why any reader in {{oldSSTables}} caused 
the index to be closed. This is incorrect in both cases because sstables 5 and 
6 are indeed the active data files (5 contains keys 0-8, and 6 contains keys 9 
& 10). 

Regarding the ref counting, we want to maintain one reference to sstables 5 & 6 
via their SSTableIndex instance but we’ve created a second reference and one 
needs to be closed. This is ensured by the 
{{newView.containsKey(sstable.descriptor)}} part of the conditional (so we are 
still indeed calling {{#release()}} on one instance). As I am writing this, 
however, I am realizing we want to keep a reference to the newer index, which 
references the newer SSTable instance and my patch does the opposite — keeping 
the old instance. I will post an updated patch along with my trunk patch for 
internal review, but the gist is to change the order we iterate over the old 
view and new indexes to favor new index instances.

NOTE: I've ignored https://issues.apache.org/jira/browse/CASSANDRA-14207 above

*I've found a few other interleavings by using another machine, but the general 
issue is the same.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-14 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364690#comment-16364690
 ] 

Jordan West commented on CASSANDRA-14055:
-

[~lboutros], the order we release them shouldn't matter for file deletion 
because as long as one index is open, there is one sstable open, and therefore 
the global reference count for the table is > 0. But if we keep the older 
reference we are leaking the old reference (and using the old metadata) until 
the {{SSTableIndex}} is released, which is wrong.

 

Either way, sounds like we agree on the fix :). I will post a new patch once 
approved internally. 

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-15 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Comment: was deleted

(was: [~lboutros], the order we release them shouldn't matter for file deletion 
because as long as one index is open, there is one sstable open, and therefore 
the global reference count for the table is > 0. But if we keep the older 
reference we are leaking the old reference (and using the old metadata) until 
the {{SSTableIndex}} is released, which is wrong.

 

Either way, sounds like we agree on the fix :). I will post a new patch once 
approved internally. )

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-19 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369601#comment-16369601
 ] 

Jordan West commented on CASSANDRA-14055:
-

[~lboutros]/[~jasobrown], some updates:

 
 I have attached two new patches. One for trunk and one of 3.11. Unfortunately, 
the test changes in trunk don't work well on 3.11 so we can't have one patch. 
The primary changes in this patch are to change the order we iterate over the 
indexes to ensure we retain the newer instance of {{SSTableIndex}} and thus the 
newer {{SSTableReader}}. I also changed the code to clone the {{oldSSTables}} 
collection since its visible outside the {{View}} constructor. 
||3.11||Trunk||
|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-3.11]|[branch|https://github.com/jrwest/cassandra/tree/14055-jrwest-trunk]|
|[utests|https://circleci.com/gh/jrwest/cassandra/24]|[utests|https://circleci.com/gh/jrwest/cassandra/26]|

NOTE: same utests are failing on 
[trunk|https://circleci.com/gh/jrwest/cassandra/25] and I'm still working on 
getting dtests running with my CircleCI setup. 

 

Also, I spoke with some colleagues including [~beobal] and [~krummas] about the 
use of {{sstableRef.globalCount()}} to determine when to delete the SASI index 
file. I've come to the conclusion that its use at all is wrong because it 
represents the number of references to the instance, not globally. Given index 
summary redistribution, this isn't a safe assumption. Looking back at the 
original SASI patches, I am not sure why it got merged this way. The 
[patches|https://github.com/xedin/sasi/blob/master/src/java/org/apache/cassandra/db/index/sasi/SSTableIndex.java#L120]
 used {{sstable.isMarkedCompacted()}} but the [merged 
code|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b#diff-4873bb6fcef158ff18d221571ef2ec7cR124]
 used {{sstableRef.globalCount()}}. Fixing this is a larger undertaking, so I 
propose we split that work into a separate ticket and focus this one on SASI's 
failure to account for index redistribution in the {{View}}. The work covered 
by the other ticket would entail either a) deleting the SASI index files as 
part of {{SSTableTidier}} or by moving {{SSTableIndex}} to use {{Ref}} and 
implementing a tidier specific to it.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch, CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-19 Thread Jordan West (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14055:

Attachment: 14055-jrwest-trunk.patch
14055-jrwest-3.11.patch

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, 
> CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-22 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373192#comment-16373192
 ] 

Jordan West commented on CASSANDRA-14055:
-

[~lboutros] its in [~jasobrown]'s queue to give it one more review but I hope 
next week.

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, 
> CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14248) SSTableIndex should not use Ref#globalCount() to determine when to delete index file

2018-02-21 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14248:
---

 Summary: SSTableIndex should not use Ref#globalCount() to 
determine when to delete index file
 Key: CASSANDRA-14248
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14248
 Project: Cassandra
  Issue Type: Bug
  Components: sasi
Reporter: Jordan West
Assignee: Jordan West
 Fix For: 3.11.x


{{SSTableIndex}} instances maintain a {{Ref}} to the underlying 
{{SSTableReader}} instance. When determining whether or not to delete the file 
after the last {{SSTableIndex}} reference is released, the implementation uses 
{{sstableRef.globalCount()}}: 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SSTableIndex.java#L135.]
 This is incorrect because {{sstableRef.globalCount()}} returns the number of 
references to the specific instance of {{SSTableReader}}. However, in cases 
like index summary redistribution, there can be more than one instance of 
{{SSTableReader}}. Further, since the reader is shared across multiple indexes, 
not all indexes see the count go to 0. This can lead to cases where the 
{{SSTableIndex}} file is incorrectly deleted or not deleted when it should be.

 

A more correct implementation would be to either:
 * Tie into the existing {{SSTableTidier}}. SASI indexes already are SSTable 
components but are not cleaned up by the {{SSTableTidier}} because they are not 
found with the currently cleanup implementation
 * Revamp {{SSTableIndex}} reference counting to use {{Ref}} and implement a 
new tidier. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14055) Index redistribution breaks SASI index

2018-02-21 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371687#comment-16371687
 ] 

Jordan West commented on CASSANDRA-14055:
-

[~lboutros] Great! Thanks for taking a look. I've created 
https://issues.apache.org/jira/browse/CASSANDRA-14248. 

> Index redistribution breaks SASI index
> --
>
> Key: CASSANDRA-14055
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14055
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Ludovic Boutros
>Assignee: Ludovic Boutros
>Priority: Major
>  Labels: patch
> Fix For: 3.11.x, 4.x
>
> Attachments: 14055-jrwest-3.11.patch, 14055-jrwest-trunk.patch, 
> CASSANDRA-14055-jrwest.patch, CASSANDRA-14055.patch, CASSANDRA-14055.patch, 
> CASSANDRA-14055.patch
>
>
> During index redistribution process, a new view is created.
> During this creation, old indexes should be released.
> But, new indexes are "attached" to the same SSTable as the old indexes.
> This leads to the deletion of the last SASI index file and breaks the index.
> The issue is in this function : 
> [https://github.com/apache/cassandra/blob/9ee44db49b13d4b4c91c9d6332ce06a6e2abf944/src/java/org/apache/cassandra/index/sasi/conf/view/View.java#L62]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11990) Address rows rather than partitions in SASI

2018-08-02 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West reassigned CASSANDRA-11990:
---

Assignee: Jordan West  (was: Alex Petrov)

> Address rows rather than partitions in SASI
> ---
>
> Key: CASSANDRA-11990
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, sasi
>Reporter: Alex Petrov
>Assignee: Jordan West
>Priority: Major
> Fix For: 4.x
>
> Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the 
> partition. After the partition lookup, the rows are iterated and the 
> operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different 
> partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings

2018-07-30 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562462#comment-16562462
 ] 

Jordan West commented on CASSANDRA-14614:
-

I'll have a patch up later this afternoon if no one wants to take this. Just 
need to run to a few meetings. 

> CircleCI config has dtests enabled but not the correct resources settings
> -
>
> Key: CASSANDRA-14614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
>
> The commit for  -CASSANDRA-9608- enabled the {{with_dtests_jobs}} 
> configuration in {{.circleci/config.yml}} but not the necessary env var 
> settings. We should revert this, unless we planned to start running dtests 
> with the correct resources on every master commit, in which case we should 
> fix the resources.
> (cc [~snazy] [~jasobrown])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings

2018-07-30 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14614:
---

 Summary: CircleCI config has dtests enabled but not the correct 
resources settings
 Key: CASSANDRA-14614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14614
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Jordan West
Assignee: Jordan West


The commit for  -CASSANDRA-9608- enabled the {{with_dtests_jobs}} configuration 
in {{.circleci/config.yml}} but not the necessary env var settings. We should 
revert this, unless we planned to start running dtests with the correct 
resources on every master commit, in which case we should fix the resources.

(cc [~snazy] [~jasobrown])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14614) CircleCI config has dtests enabled but not the correct resources settings

2018-07-30 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562673#comment-16562673
 ] 

Jordan West commented on CASSANDRA-14614:
-

[branch|https://github.com/jrwest/cassandra/tree/14614-trunk]  | 
[tests|https://circleci.com/gh/jrwest/cassandra/tree/14614-trunk]

> CircleCI config has dtests enabled but not the correct resources settings
> -
>
> Key: CASSANDRA-14614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14614
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Trivial
>
> The commit for  -CASSANDRA-9608- enabled the {{with_dtests_jobs}} 
> configuration in {{.circleci/config.yml}} but not the necessary env var 
> settings. We should revert this, unless we planned to start running dtests 
> with the correct resources on every master commit, in which case we should 
> fix the resources.
> (cc [~snazy] [~jasobrown])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16

2018-07-26 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558722#comment-16558722
 ] 

Jordan West commented on CASSANDRA-14468:
-

[~iamaleksey] reading the code again, I *think* it should be safe to drop as 
well, for the reasons you list. The {{ColumnIdentifier}} in the 
{{ColumnDefinition}}/{{ColumnMetadata}} will be different (by reference) than 
the ones returned by {{Literal#prepare}} but since they are structurally equal 
that should be ok. Otherwise, its hard to separate out its initial intention 
since it was committed as part of CASSANDRA-8099. 

> "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
> -
>
> Key: CASSANDRA-14468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14468
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wade Simmons
>Priority: Major
> Attachments: data.tar.gz
>
>
> I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this 
> error:
> {code}
> org.apache.cassandra.exceptions.ConfigurationException: Unable to parse 
> targets for index idx_foo ("666f6f")
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) 
> [apache-cassandra-3.0.16.jar:3.0.16]
> {code}
> It looks like this might be related to CASSANDRA-14104 that was just added to 
> 3.0.16 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14542) Deselect no_offheap_memtables dtests

2018-07-26 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14542:

Reviewer: Jordan West

> Deselect no_offheap_memtables dtests
> 
>
> Key: CASSANDRA-14542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14542
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: dtest
>
> After the large rework of dtests in CASSANDRA-14134, one task left undone was 
> to enable running dtests with offheap memtables. That was resolved in 
> CASSANDRA-14056. However, there are a few tests explicitly marked as 
> "no_offheap_memtables", and we should respect that marking when running the 
> dtests with offheap memtables enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14197) SSTable upgrade should be automatic

2018-07-26 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14197:

Reviewers: Ariel Weisberg  (was: Ariel Weisberg, Jordan West)

> SSTable upgrade should be automatic
> ---
>
> Key: CASSANDRA-14197
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14197
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> Upgradesstables should run automatically on node upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14197) SSTable upgrade should be automatic

2018-07-26 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14197:

Reviewers: Ariel Weisberg, Jordan West
 Reviewer:   (was: Ariel Weisberg)

> SSTable upgrade should be automatic
> ---
>
> Key: CASSANDRA-14197
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14197
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> Upgradesstables should run automatically on node upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16

2018-07-26 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West reassigned CASSANDRA-14468:
---

Assignee: Jordan West

> "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
> -
>
> Key: CASSANDRA-14468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14468
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wade Simmons
>Assignee: Jordan West
>Priority: Major
> Attachments: data.tar.gz
>
>
> I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this 
> error:
> {code}
> org.apache.cassandra.exceptions.ConfigurationException: Unable to parse 
> targets for index idx_foo ("666f6f")
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) 
> [apache-cassandra-3.0.16.jar:3.0.16]
> {code}
> It looks like this might be related to CASSANDRA-14104 that was just added to 
> 3.0.16 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16

2018-07-26 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558848#comment-16558848
 ] 

Jordan West commented on CASSANDRA-14468:
-

Assigned to myself

> "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
> -
>
> Key: CASSANDRA-14468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14468
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wade Simmons
>Assignee: Jordan West
>Priority: Major
> Attachments: data.tar.gz
>
>
> I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this 
> error:
> {code}
> org.apache.cassandra.exceptions.ConfigurationException: Unable to parse 
> targets for index idx_foo ("666f6f")
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) 
> [apache-cassandra-3.0.16.jar:3.0.16]
> {code}
> It looks like this might be related to CASSANDRA-14104 that was just added to 
> 3.0.16 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14468) "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16

2018-07-31 Thread Jordan West (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-14468:

Reproduced In: 3.0.16, 3.0.15, 3.0.12, 3.0.10  (was: 3.0.10, 3.0.12, 
3.0.15, 3.0.16)
   Status: Patch Available  (was: Open)

I would like to add a dtest for this but wanted to push up the patch to get 
review started.

||trunk||3.0||
|[branch|https://github.com/jrwest/cassandra/tree/14468-trunk]|[branch|https://github.com/jrwest/cassandra/tree/14468-3.0]|
|[tests|https://circleci.com/gh/jrwest/cassandra/tree/14468-trunk]|[tests|https://circleci.com/gh/jrwest/cassandra/tree/14468-3.0]|

> "Unable to parse targets for index" on upgrade to Cassandra 3.0.10-3.0.16
> -
>
> Key: CASSANDRA-14468
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14468
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wade Simmons
>Assignee: Jordan West
>Priority: Major
> Attachments: data.tar.gz
>
>
> I am attempting to upgrade from Cassandra 2.2.10 to 3.0.16. I am getting this 
> error:
> {code}
> org.apache.cassandra.exceptions.ConfigurationException: Unable to parse 
> targets for index idx_foo ("666f6f")
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.parseTarget(CassandraIndex.java:800)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata(CassandraIndex.java:747)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:645)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:251) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:697) 
> [apache-cassandra-3.0.16.jar:3.0.16]
> {code}
> It looks like this might be related to CASSANDRA-14104 that was just added to 
> 3.0.16 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14542) Deselect no_offheap_memtables dtests

2018-07-26 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558478#comment-16558478
 ] 

Jordan West commented on CASSANDRA-14542:
-

+1

> Deselect no_offheap_memtables dtests
> 
>
> Key: CASSANDRA-14542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14542
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>  Labels: dtest
>
> After the large rework of dtests in CASSANDRA-14134, one task left undone was 
> to enable running dtests with offheap memtables. That was resolved in 
> CASSANDRA-14056. However, there are a few tests explicitly marked as 
> "no_offheap_memtables", and we should respect that marking when running the 
> dtests with offheap memtables enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14636) Revert 4.0 GC alg back to CMS

2018-08-10 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576589#comment-16576589
 ] 

Jordan West edited comment on CASSANDRA-14636 at 8/10/18 5:10 PM:
--

+1. 

My understanding is {{UseParNewGC}} is implicit in Java 10+. Otherwise it might 
be worth adding a note about that specific change. 


was (Author: jrwest):
+1. My understanding is {{UseParNewGC}} is implicit in Java 10+, otherwise it 
might be worth adding a note about that specific change. 

> Revert 4.0 GC alg back to CMS
> -
>
> Key: CASSANDRA-14636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14636
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> CASSANDRA-9608 accidentally swapped the default GC algorithm from CMS to G1. 
> Until further community consensus is achieved about swapping the default alg, 
> we should switch back to CMS.
> As reported by [~rustyrazorblade] on the [dev@ 
> ML|https://lists.apache.org/thread.html/0b30f9c84457033583e9a3e0828adc603e01f1ca03ce0816098883cc@%3Cdev.cassandra.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14636) Revert 4.0 GC alg back to CMS

2018-08-10 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576589#comment-16576589
 ] 

Jordan West commented on CASSANDRA-14636:
-

+1. My understanding is {{UseParNewGC}} is implicit in Java 10+, otherwise it 
might be worth adding a note about that specific change. 

> Revert 4.0 GC alg back to CMS
> -
>
> Key: CASSANDRA-14636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14636
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> CASSANDRA-9608 accidentally swapped the default GC algorithm from CMS to G1. 
> Until further community consensus is achieved about swapping the default alg, 
> we should switch back to CMS.
> As reported by [~rustyrazorblade] on the [dev@ 
> ML|https://lists.apache.org/thread.html/0b30f9c84457033583e9a3e0828adc603e01f1ca03ce0816098883cc@%3Cdev.cassandra.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8

2018-08-08 Thread Jordan West (JIRA)
Jordan West created CASSANDRA-14627:
---

 Summary: CASSANDRA-9608 broke running Cassandra and tests in 
IntelliJ under Java 8
 Key: CASSANDRA-14627
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14627
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Jordan West


CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are not 
supported in Java 8: 
https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841.
 

{code}
Unrecognized option: --add-exports
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
{code}

To reproduce:
1. Update to the most recent trunk
2. rm -rf .idea && ant generate-idea-files
3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a test. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8

2018-08-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573483#comment-16573483
 ] 

Jordan West commented on CASSANDRA-14627:
-

This is slightly different from CASSANDRA-14613 in that {{ide/workspace.xml}} 
is broken instead of {{ide/idea-iml-file.xml}} but I'm happy to dupe this to 
it. I do think a short term fix for this is warranted: at a minimum, breaking 
Java 11 in the IDE instead of Java 8.

> CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8
> -
>
> Key: CASSANDRA-14627
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14627
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jordan West
>Priority: Critical
>
> CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are 
> not supported in Java 8: 
> https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841.
>  
> {code}
> Unrecognized option: --add-exports
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> To reproduce:
> 1. Update to the most recent trunk
> 2. rm -rf .idea && ant generate-idea-files
> 3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a 
> test. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14627) CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8

2018-08-08 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573486#comment-16573486
 ] 

Jordan West commented on CASSANDRA-14627:
-

I should also note, the local workaround, in the meantime, is to manually 
delete the Java 11 arguments from {{ide/workspace.xml}} or from the specific 
IntelliJ configurations being used. 

> CASSANDRA-9608 broke running Cassandra and tests in IntelliJ under Java 8
> -
>
> Key: CASSANDRA-14627
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14627
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jordan West
>Priority: Critical
>
> CASSANDRA-9608 added a couple hard-coded options to workspace.xml that are 
> not supported in Java 8: 
> https://github.com/apache/cassandra/commit/6ba2fb9395226491872b41312d978a169f36fcdb#diff-59e65c5abf01f83a11989765ada76841.
>  
> {code}
> Unrecognized option: --add-exports
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> {code}
> To reproduce:
> 1. Update to the most recent trunk
> 2. rm -rf .idea && ant generate-idea-files
> 3. Re-open the project in IntelliJ (using Java 8) and run Cassandra or a 
> test. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14644) CircleCI Builds should optional run in-tree tests other than test/unit

2018-08-15 Thread Jordan West (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581636#comment-16581636
 ] 

Jordan West commented on CASSANDRA-14644:
-

[~vinaykumarcse] done. Thanks! I will be happy to review. 

> CircleCI Builds should optional run in-tree tests other than test/unit
> --
>
> Key: CASSANDRA-14644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14644
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jordan West
>Assignee: Vinay Chella
>Priority: Critical
>
> Currently, circleci is hardcoded to search for tests in the test/unit 
> directory only: 
> https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L166. 
> This means tests under `test-compression` and `test-long` are not run. Like 
> dtests, there should be a simple way to modify the config to run these as 
> well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   >