[jira] [Updated] (CASSANDRA-11532) CqlConfigHelper requires both truststore and keystore to work with SSL encryption

2016-04-07 Thread Jacek Lewandowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-11532:
--
Attachment: CASSANDRA_11532.patch

> CqlConfigHelper requires both truststore and keystore to work with SSL 
> encryption
> -
>
> Key: CASSANDRA-11532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11532
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
> Attachments: CASSANDRA_11532.patch
>
>
> {{CqlConfigHelper}} configures SSL in the following way:
> {code:java}
> public static Optional getSSLOptions(Configuration conf)
> {
> Optional truststorePath = 
> getInputNativeSSLTruststorePath(conf);
> Optional keystorePath = getInputNativeSSLKeystorePath(conf);
> Optional truststorePassword = 
> getInputNativeSSLTruststorePassword(conf);
> Optional keystorePassword = 
> getInputNativeSSLKeystorePassword(conf);
> Optional cipherSuites = getInputNativeSSLCipherSuites(conf);
> 
> if (truststorePath.isPresent() && keystorePath.isPresent() && 
> truststorePassword.isPresent() && keystorePassword.isPresent())
> {
> SSLContext context;
> try
> {
> context = getSSLContext(truststorePath.get(), 
> truststorePassword.get(), keystorePath.get(), keystorePassword.get());
> }
> catch (UnrecoverableKeyException | KeyManagementException |
> NoSuchAlgorithmException | KeyStoreException | 
> CertificateException | IOException e)
> {
> throw new RuntimeException(e);
> }
> String[] css = null;
> if (cipherSuites.isPresent())
> css = cipherSuites.get().split(",");
> return Optional.of(JdkSSLOptions.builder()
> .withSSLContext(context)
> .withCipherSuites(css)
> .build());
> }
> return Optional.absent();
> }
> {code}
> which forces you to connect only to trusted nodes and client authentication. 
> This should be made more flexible so that at least client authentication is 
> optional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11532) CqlConfigHelper requires both truststore and keystore to work with SSL encryption

2016-04-07 Thread Jacek Lewandowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-11532:
--
Status: Patch Available  (was: In Progress)

> CqlConfigHelper requires both truststore and keystore to work with SSL 
> encryption
> -
>
> Key: CASSANDRA-11532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11532
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
> Attachments: CASSANDRA_11532.patch
>
>
> {{CqlConfigHelper}} configures SSL in the following way:
> {code:java}
> public static Optional getSSLOptions(Configuration conf)
> {
> Optional truststorePath = 
> getInputNativeSSLTruststorePath(conf);
> Optional keystorePath = getInputNativeSSLKeystorePath(conf);
> Optional truststorePassword = 
> getInputNativeSSLTruststorePassword(conf);
> Optional keystorePassword = 
> getInputNativeSSLKeystorePassword(conf);
> Optional cipherSuites = getInputNativeSSLCipherSuites(conf);
> 
> if (truststorePath.isPresent() && keystorePath.isPresent() && 
> truststorePassword.isPresent() && keystorePassword.isPresent())
> {
> SSLContext context;
> try
> {
> context = getSSLContext(truststorePath.get(), 
> truststorePassword.get(), keystorePath.get(), keystorePassword.get());
> }
> catch (UnrecoverableKeyException | KeyManagementException |
> NoSuchAlgorithmException | KeyStoreException | 
> CertificateException | IOException e)
> {
> throw new RuntimeException(e);
> }
> String[] css = null;
> if (cipherSuites.isPresent())
> css = cipherSuites.get().split(",");
> return Optional.of(JdkSSLOptions.builder()
> .withSSLContext(context)
> .withCipherSuites(css)
> .build());
> }
> return Optional.absent();
> }
> {code}
> which forces you to connect only to trusted nodes and client authentication. 
> This should be made more flexible so that at least client authentication is 
> optional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11533) Support creating and returning UDTs from a UDF

2016-04-07 Thread Henry Manasseh (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Manasseh updated CASSANDRA-11533:
---
Description: 
UDFs support returning UDTs. I see the various test cases which validate by 
returning the parameter UDT as the return UDT (which works because it returns 
an already existing UDTValue instance). But there is currently no way to create 
a new instance of an UDTValue and return it.

It seems creating a new UDTValue instance to return from the UDF would require 
getting an instance of the KeyspaceMetadata but that requires access to 
packages which are not white listed. 

A solution could be to add a method to the JavaUDT class to get the UserType 
(encapsulating use of the Schema class).


e.g. change to org.apache.cassandra.cql3.functions.JavaUDF:

protected UserType getUserType(String userTypeName) {

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

  return nyUdt;
}

To illustrate, I wrote a simple UDF to transform one UDT into another UDT. This 
fails compilation because Schema is not whitelisted. 

CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
 RETURNS NULL ON NULL INPUT
 RETURNS my_other_udt
 LANGUAGE java
  AS '
String fieldA = val.getString("field_a");

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

com.datastax.driver.core.UDTValue transformedValue = myUdt.newValue();

transformedValue.setUUID("id", java.util.UUID.randomUUID());
transformedValue.setString("field_a", fieldA);
transformedValue.setString("field_b", "value b");

return transformedValue;
  ';

This is the error:
:88:InvalidRequest: code=2200 [Invalid query] message="Could not compile 
function 'test_ks.transform_udt' from Java source: 
org.apache.cassandra.exceptions.InvalidRequestException: Java source 
compilation failed:
Line 4: org.apache.cassandra.schema.KeyspaceMetadata cannot be resolved to a 
type
Line 4: org.apache.cassandra.config.Schema.instance cannot be resolved to a type
"




  was:
UDFs support returning UDTs. I see the various test cases which validate by 
returning the parameter UDT as the return UDT (which works because it returns 
an already existing UDTValue instance). But there is currently no way to create 
a new instance of an UDTValue and return it.

It seems that to accomplish that it would require getting an instance of the 
KeyspaceMetadata but that requires use of packages which are not white listed. 

A solution could be to add a method to getUserType(String) which calls schema 
in the super class org.apache.cassandra.cql3.functions.JavaUDF.

e.g.
protected UserType getUserType(String userTypeName) {

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

  return nyUdt;
}

To illustrate, I wrote a simple UDF to transform one UDT into another UDT. This 
fails compilation because Schema is not whitelisted. 

CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
 RETURNS NULL ON NULL INPUT
 RETURNS my_other_udt
 LANGUAGE java
  AS '
String fieldA = val.getString("field_a");

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

com.datastax.driver.core.UDTValue transformedValue = myUdt.newValue();

transformedValue.setUUID("id", java.util.UUID.randomUUID());
transformedValue.setString("field_a", fieldA);
transformedValue.setString("field_b", "value b");

return transformedValue;
  ';

This is the error:
:88:InvalidRequest: code=2200 [Invalid query] message="Could not compile 
function 'test_ks.transform_udt' from Java source: 
org.apache.cassandra.exceptions.InvalidRequestException: Java source 
compilation failed:
Line 4: org.apache.cassandra.schema.KeyspaceMetadata cannot be resolved to a 
type
Line 4: org.apache.cassandra.config.Schema.instance cannot be resolved to a type
"





> Support creating and returning UDTs from a UDF
> --
>
> Key: CASSANDRA-11533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11533
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Henry Manasseh
>Priority: Minor
>
> UDFs support returning UDTs. I see the various test cases which validate by 
> returning the parameter UDT as the return UDT 

[jira] [Created] (CASSANDRA-11533) Support creating and returning UDTs from a UDF

2016-04-07 Thread Henry Manasseh (JIRA)
Henry Manasseh created CASSANDRA-11533:
--

 Summary: Support creating and returning UDTs from a UDF
 Key: CASSANDRA-11533
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11533
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Henry Manasseh
Priority: Minor


UDFs support returning UDTs. I see the various test cases which validate by 
returning the parameter UDT as the return UDT (which works because it returns 
an already existing UDTValue instance). But there is currently no way to create 
a new instance of an UDTValue and return it.

It seems that to accomplish that it would require getting an instance of the 
KeyspaceMetadata but that requires use of packages which are not white listed. 

A solution could be to add a method to getUserType(String) which calls schema 
in the super class org.apache.cassandra.cql3.functions.JavaUDF.

e.g.
protected UserType getUserType(String userTypeName) {

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

  return nyUdt;
}

To illustrate, I wrote a simple UDF to transform one UDT into another UDT. This 
fails compilation because Schema is not whitelisted. 

CREATE OR REPLACE FUNCTION test_ks.transform_udt (val my_udt)
 RETURNS NULL ON NULL INPUT
 RETURNS my_other_udt
 LANGUAGE java
  AS '
String fieldA = val.getString("field_a");

org.apache.cassandra.schema.KeyspaceMetadata ksm = 
org.apache.cassandra.config.Schema.instance.getKSMetaData("test_ks");

com.datastax.driver.core.UserType myUdt = 
ksm.types.get("my_other_udt").get();

com.datastax.driver.core.UDTValue transformedValue = myUdt.newValue();

transformedValue.setUUID("id", java.util.UUID.randomUUID());
transformedValue.setString("field_a", fieldA);
transformedValue.setString("field_b", "value b");

return transformedValue;
  ';

This is the error:
:88:InvalidRequest: code=2200 [Invalid query] message="Could not compile 
function 'test_ks.transform_udt' from Java source: 
org.apache.cassandra.exceptions.InvalidRequestException: Java source 
compilation failed:
Line 4: org.apache.cassandra.schema.KeyspaceMetadata cannot be resolved to a 
type
Line 4: org.apache.cassandra.config.Schema.instance cannot be resolved to a type
"






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mattias W updated CASSANDRA-11528:
--
Description: 
While implementing a dump procedure, which did "select * from" from one table 
at a row, I instantly kill the server. A simple 
{noformat}select count(*) from {noformat} 
also kills it. For a while, I thought the size of blobs were the cause

I also try to only have a unique id as partition key, I was afraid a single 
partition got too big or so, but that didn't change anything

It happens every time, both from Java/Clojure and from DevCenter.

I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is so 
quick, so nothing is recorded there.

There is a Java-out-of-memory in the logs, but that isn't from the time of the 
crash.

It only happens for one table, it only has 15000 entries, but there are blobs 
and byte[] stored there, size between 100kb - 4Mb. Total size for that table is 
about 6.5 GB on disk.

I made a workaround by doing many small selects instead, each only fetching 100 
rows.

Is there a setting a can set to make the system log more eagerly, in order to 
at least get a stacktrace or similar, that might help you.

It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
again

  was:
While implementing a dump procedure, which did "select * from" from one table 
at a row, I instantly kill the server. A simple "select count(*) from"  also 
kills it. For a while, I thought the size of blobs were the cause

I also try to only have a unique id as partition key, I was afraid a single 
partition got too big or so, but that didn't change anything

It happens every time, both from Java/Clojure and from DevCenter.

I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is so 
quick, so nothing is recorded there.

There is a Java-out-of-memory in the logs, but that isn't from the time of the 
crash.

It only happens for one table, it only has 15000 entries, but there are blobs 
and byte[] stored there, size between 100kb - 4Mb. Total size for that table is 
about 6.5 GB on disk.

I made a workaround by doing many small selects instead, each only fetching 100 
rows.

Is there a setting a can set to make the system log more eagerly, in order to 
at least get a stacktrace or similar, that might help you.

It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
again


> Server Crash when select returns more than a few hundred rows
> -
>
> Key: CASSANDRA-11528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: windows 7, 8 GB machine
>Reporter: Mattias W
> Fix For: 3.3
>
> Attachments: datastax_ddc_server-stdout.2016-04-07.log
>
>
> While implementing a dump procedure, which did "select * from" from one table 
> at a row, I instantly kill the server. A simple 
> {noformat}select count(*) from {noformat} 
> also kills it. For a while, I thought the size of blobs were the cause
> I also try to only have a unique id as partition key, I was afraid a single 
> partition got too big or so, but that didn't change anything
> It happens every time, both from Java/Clojure and from DevCenter.
> I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is 
> so quick, so nothing is recorded there.
> There is a Java-out-of-memory in the logs, but that isn't from the time of 
> the crash.
> It only happens for one table, it only has 15000 entries, but there are blobs 
> and byte[] stored there, size between 100kb - 4Mb. Total size for that table 
> is about 6.5 GB on disk.
> I made a workaround by doing many small selects instead, each only fetching 
> 100 rows.
> Is there a setting a can set to make the system log more eagerly, in order to 
> at least get a stacktrace or similar, that might help you.
> It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
> again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231649#comment-15231649
 ] 

Mattias W edited comment on CASSANDRA-11528 at 4/8/16 5:17 AM:
---

I get strange behaviour also on smaller and much more normal tables. For example
{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;
{noformat}
works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;
{noformat}

No new entries are made in the log file


was (Author: mattiasw2):
I get strange behaviour also on smaller and much more normal tables. For example
{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;
{noformat}
works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;
{noformat}


> Server Crash when select returns more than a few hundred rows
> -
>
> Key: CASSANDRA-11528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: windows 7, 8 GB machine
>Reporter: Mattias W
> Fix For: 3.3
>
> Attachments: datastax_ddc_server-stdout.2016-04-07.log
>
>
> While implementing a dump procedure, which did "select * from" from one table 
> at a row, I instantly kill the server. A simple 
> {noformat}select count(*) from {noformat} 
> also kills it. For a while, I thought the size of blobs were the cause
> I also try to only have a unique id as partition key, I was afraid a single 
> partition got too big or so, but that didn't change anything
> It happens every time, both from Java/Clojure and from DevCenter.
> I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is 
> so quick, so nothing is recorded there.
> There is a Java-out-of-memory in the logs, but that isn't from the time of 
> the crash.
> It only happens for one table, it only has 15000 entries, but there are blobs 
> and byte[] stored there, size between 100kb - 4Mb. Total size for that table 
> is about 6.5 GB on disk.
> I made a workaround by doing many small selects instead, each only fetching 
> 100 rows.
> Is there a setting a can set to make the system log more eagerly, in order to 
> at least get a stacktrace or similar, that might help you.
> It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
> again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231649#comment-15231649
 ] 

Mattias W edited comment on CASSANDRA-11528 at 4/8/16 5:15 AM:
---

I get strange behaviour also on smaller and much more normal tables. For example
{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;
{noformat}
works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

{noformat}
SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;
{noformat}



was (Author: mattiasw2):
I get strange behaviour also on smaller and much more normal tables. For example

{{SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;}}

works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

{{SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;}}


> Server Crash when select returns more than a few hundred rows
> -
>
> Key: CASSANDRA-11528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: windows 7, 8 GB machine
>Reporter: Mattias W
> Fix For: 3.3
>
> Attachments: datastax_ddc_server-stdout.2016-04-07.log
>
>
> While implementing a dump procedure, which did "select * from" from one table 
> at a row, I instantly kill the server. A simple "select count(*) from"  also 
> kills it. For a while, I thought the size of blobs were the cause
> I also try to only have a unique id as partition key, I was afraid a single 
> partition got too big or so, but that didn't change anything
> It happens every time, both from Java/Clojure and from DevCenter.
> I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is 
> so quick, so nothing is recorded there.
> There is a Java-out-of-memory in the logs, but that isn't from the time of 
> the crash.
> It only happens for one table, it only has 15000 entries, but there are blobs 
> and byte[] stored there, size between 100kb - 4Mb. Total size for that table 
> is about 6.5 GB on disk.
> I made a workaround by doing many small selects instead, each only fetching 
> 100 rows.
> Is there a setting a can set to make the system log more eagerly, in order to 
> at least get a stacktrace or similar, that might help you.
> It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
> again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231649#comment-15231649
 ] 

Mattias W edited comment on CASSANDRA-11528 at 4/8/16 5:14 AM:
---

I get strange behaviour also on smaller and much more normal tables. For example

{{SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;}}

works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

{{SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;}}



was (Author: mattiasw2):
I get strange behaviour also on smaller and much more normal tables. For example

SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;

works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;


> Server Crash when select returns more than a few hundred rows
> -
>
> Key: CASSANDRA-11528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: windows 7, 8 GB machine
>Reporter: Mattias W
> Fix For: 3.3
>
> Attachments: datastax_ddc_server-stdout.2016-04-07.log
>
>
> While implementing a dump procedure, which did "select * from" from one table 
> at a row, I instantly kill the server. A simple "select count(*) from"  also 
> kills it. For a while, I thought the size of blobs were the cause
> I also try to only have a unique id as partition key, I was afraid a single 
> partition got too big or so, but that didn't change anything
> It happens every time, both from Java/Clojure and from DevCenter.
> I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is 
> so quick, so nothing is recorded there.
> There is a Java-out-of-memory in the logs, but that isn't from the time of 
> the crash.
> It only happens for one table, it only has 15000 entries, but there are blobs 
> and byte[] stored there, size between 100kb - 4Mb. Total size for that table 
> is about 6.5 GB on disk.
> I made a workaround by doing many small selects instead, each only fetching 
> 100 rows.
> Is there a setting a can set to make the system log more eagerly, in order to 
> at least get a stacktrace or similar, that might help you.
> It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
> again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231649#comment-15231649
 ] 

Mattias W commented on CASSANDRA-11528:
---

I get strange behaviour also on smaller and much more normal tables. For example

SELECT COUNT(*) FROM usr WHERE disabled = true LIMIT 100 ALLOW FILTERING;

works fine from within devcenter

but the next one, which hits many more rows, temporarily makes the server 
unavailable, and reports "Unable to execute CQL script on 'connection1': 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed". This error message is the 
same as above, except that the server doesn't die.

SELECT COUNT(*) FROM usr WHERE disabled = null LIMIT 100 ALLOW FILTERING;


> Server Crash when select returns more than a few hundred rows
> -
>
> Key: CASSANDRA-11528
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: windows 7, 8 GB machine
>Reporter: Mattias W
> Fix For: 3.3
>
> Attachments: datastax_ddc_server-stdout.2016-04-07.log
>
>
> While implementing a dump procedure, which did "select * from" from one table 
> at a row, I instantly kill the server. A simple "select count(*) from"  also 
> kills it. For a while, I thought the size of blobs were the cause
> I also try to only have a unique id as partition key, I was afraid a single 
> partition got too big or so, but that didn't change anything
> It happens every time, both from Java/Clojure and from DevCenter.
> I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is 
> so quick, so nothing is recorded there.
> There is a Java-out-of-memory in the logs, but that isn't from the time of 
> the crash.
> It only happens for one table, it only has 15000 entries, but there are blobs 
> and byte[] stored there, size between 100kb - 4Mb. Total size for that table 
> is about 6.5 GB on disk.
> I made a workaround by doing many small selects instead, each only fetching 
> 100 rows.
> Is there a setting a can set to make the system log more eagerly, in order to 
> at least get a stacktrace or similar, that might help you.
> It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
> again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-11529:
-
Status: Patch Available  (was: In Progress)

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231641#comment-15231641
 ] 

Stefania commented on CASSANDRA-11529:
--

I've removed the check for local mutations and added a threshold that can be 
configured via a new yaml parameter, 
{{unlogged_batch_across_partitions_warn_threshold}}.

Here are the patches and CI results:

||2.1||2.2||3.0||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/11529-2.1]|[patch|https://github.com/stef1927/cassandra/commits/11529-2.2]|[patch|https://github.com/stef1927/cassandra/commits/11529-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11529]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11529-dtest/]|

No single patch up-merges cleanly, there are conflicts for all branches.

We also have a dtest patch 
[here|https://github.com/stef1927/cassandra-dtest/commits/11529]. 


> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11532) CqlConfigHelper requires both truststore and keystore to work with SSL encryption

2016-04-07 Thread Jacek Lewandowski (JIRA)
Jacek Lewandowski created CASSANDRA-11532:
-

 Summary: CqlConfigHelper requires both truststore and keystore to 
work with SSL encryption
 Key: CASSANDRA-11532
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11532
 Project: Cassandra
  Issue Type: Bug
Reporter: Jacek Lewandowski
Assignee: Jacek Lewandowski


{{CqlConfigHelper}} configures SSL in the following way:

{code:java}
public static Optional getSSLOptions(Configuration conf)
{
Optional truststorePath = getInputNativeSSLTruststorePath(conf);
Optional keystorePath = getInputNativeSSLKeystorePath(conf);
Optional truststorePassword = 
getInputNativeSSLTruststorePassword(conf);
Optional keystorePassword = 
getInputNativeSSLKeystorePassword(conf);
Optional cipherSuites = getInputNativeSSLCipherSuites(conf);

if (truststorePath.isPresent() && keystorePath.isPresent() && 
truststorePassword.isPresent() && keystorePassword.isPresent())
{
SSLContext context;
try
{
context = getSSLContext(truststorePath.get(), 
truststorePassword.get(), keystorePath.get(), keystorePassword.get());
}
catch (UnrecoverableKeyException | KeyManagementException |
NoSuchAlgorithmException | KeyStoreException | 
CertificateException | IOException e)
{
throw new RuntimeException(e);
}
String[] css = null;
if (cipherSuites.isPresent())
css = cipherSuites.get().split(",");
return Optional.of(JdkSSLOptions.builder()
.withSSLContext(context)
.withCipherSuites(css)
.build());
}
return Optional.absent();
}
{code}

which forces you to connect only to trusted nodes and client authentication. 
This should be made more flexible so that at least client authentication is 
optional. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-07 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231597#comment-15231597
 ] 

Jordan West commented on CASSANDRA-11525:
-

||branch||testall||dtest||
|[CASSANDRA-11525|https://github.com/xedin/cassandra/tree/CASSANDRA-11525]|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11525-dtest/]|

[~doanduyhai] can you try this branch and see if it  addresses the issue? Also, 
can you please upload all of the SSTable components (including SSTable index 
files) so we can test here as well?

The issue was caused by an invalid assumption when clustering columns are used: 
when stitching together multiple index parts it was possible that the same 
term, for the same token was in multiple parts, resulting in the union iterator 
returning the incorrect count. The new approach counts the number of tokens 
while performing the first iteration. The complexity of the algorithm has not 
changed and it should be similar in performance. 

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Created] (CASSANDRA-11531) JMX endpoint to mark sstable obsolete

2016-04-07 Thread Jeff Jirsa (JIRA)
Jeff Jirsa created CASSANDRA-11531:
--

 Summary: JMX endpoint to mark sstable obsolete
 Key: CASSANDRA-11531
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11531
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: Jeff Jirsa
Priority: Trivial


Perhaps controversial, but there exist times when it'd be really convenient to 
be able to mark an sstable as obsolete instead of trying to figure out which 
order to do user-defined-compaction on the various sstables blocking 
{{getFullyExpiredSStables}} from dropping it completely (or waiting for 
blocking sstables to age and unblock). 

CASSANDRA-10496 may make this much less common, but in the mean time, a jmx 
{{unsafeMarkObsolete}} sure would be convenient. Dangerous, but convenient.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-11529:

Reviewer: Paulo Motta

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11051) Make LZ4 Compression Level Configurable

2016-04-07 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231516#comment-15231516
 ] 

Michael Kjellman commented on CASSANDRA-11051:
--

+1 Looks great! Thanks [~krummas] (sorry for the delay again... stupid spam 
filter keeps triggering on your Jira updates)

> Make LZ4 Compression Level Configurable 
> 
>
> Key: CASSANDRA-11051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11051
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
> Attachments: lz4_2.2.patch
>
>
> We'd like to make the LZ4 Compressor implementation configurable on a per 
> column family basis. Testing has shown a ~4% reduction in file size with the 
> higher compression LZ4 implementation vs the standard compressor we currently 
> use instantiated by the default constructor. The attached patch adds the 
> following optional parameters 'lz4_compressor_type' and 
> 'lz4_high_compressor_level' to the LZ4Compressor. If none of the new optional 
> parameters are specified, the Compressor will use the same defaults Cassandra 
> has always had for LZ4.
> New LZ4Compressor Optional Parameters:
>   * lz4_compressor_type can currently be either 'high' (uses LZ4HCCompressor) 
> or 'fast' (uses LZ4Compressor)
>   * lz4_high_compressor_level can be set between 1 and 17. Not specifying a 
> compressor level while specifying lz4_compressor_type as 'high' will use a 
> default level of 9 (as picked by the LZ4 library as the "default").
> Currently, we use the default LZ4 compressor constructor. This change would 
> just expose the level (and implementation to use) to the user via the schema. 
> There are many potential cases where users may find that the tradeoff in 
> additional CPU and memory usage is worth the on-disk space savings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11470) dtest failure in materialized_views_test.TestMaterializedViews.base_replica_repair_test

2016-04-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231492#comment-15231492
 ] 

Stefania commented on CASSANDRA-11470:
--

Thanks for setting this up so quickly. No failures, which is good I suppose. 
However, I really would like to understand if this is because of the patch of 
because of some other reason. Could we run it another 150 times perhaps, 
against [this patch|https://github.com/stef1927/cassandra/commits/11470-debug], 
which is just trunk + the log message. If we find a failure, this should tell 
us what the problem is and confirm that indeed the patch has fixed it. If it 
doesn't fail, then it may be that for some reason we are not able to reproduce 
the problem when running the test in isolation. What do you think?

> dtest failure in 
> materialized_views_test.TestMaterializedViews.base_replica_repair_test
> ---
>
> Key: CASSANDRA-11470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Stefania
>  Labels: dtest
> Fix For: 3.x
>
> Attachments: node1.log, node2.log, node2_debug.log, node3.log, 
> node3_debug.log
>
>
> base_replica_repair_test has failed on trunk with the following exception in 
> the log of node2:
> {code}
> ERROR [main] 2016-03-31 08:48:46,949 CassandraDaemon.java:708 - Exception 
> encountered during startup
> java.lang.RuntimeException: Failed to list files in 
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985
> at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:725)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:690) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:567)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:555)
>  ~[main/:na]
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:383) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.(Keyspace.java:320) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:130) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:107) 
> ~[main/:na]
> at 
> org.apache.cassandra.cql3.restrictions.StatementRestrictions.(StatementRestrictions.java:139)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:864)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:811)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:799)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:505)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:242)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.prepareInternal(QueryProcessor.java:286)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:294)
>  ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.query(SchemaKeyspace.java:1246) 
> ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:875)
>  ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:867)
>  ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:134) 
> ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:124) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:229) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) 
> [main/:na]
> Caused by: java.lang.RuntimeException: Failed to list directory files in 
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985,
>  inconsistent disk state for transaction 
> [ma_txn_flush_58db56b0-f71d-11e5-bf68-03a01adb9f11.log in 
> 

[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-07 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231487#comment-15231487
 ] 

Jack Krupansky commented on CASSANDRA-8844:
---

Since this new feature has evolved significantly since the original 
description, is there a good summary available for the current form of the 
feature? Not like full doc or the internal implementation details, but a 
concise summary at the user level, like where the CDC data will be stored, its 
format, how to retrieve it, and potential performance impact, both in terms of 
amount of CPU time required and additional memory required if CDC is enabled. 
Thanks.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order, even when it is 

[jira] [Commented] (CASSANDRA-11520) Implement optimized local read path for CL.ONE

2016-04-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231454#comment-15231454
 ] 

Stefania commented on CASSANDRA-11520:
--

That's a very good point. We should apply this optimization only when repair 
decision and speculative retry are set to NONE. We basically apply this 
optimization only when there is a single target replica selected, and it 
happens to be the local host.

> Implement optimized local read path for CL.ONE
> --
>
> Key: CASSANDRA-11520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11520
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL, Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> -Add an option to the CQL SELECT statement to- Bypass the coordination layer 
> when reading locally at CL.ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania reassigned CASSANDRA-11529:


Assignee: Stefania

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests

2016-04-07 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231414#comment-15231414
 ] 

Stefania commented on CASSANDRA-11521:
--

CL ONE only, at least in the initial implementation.

> Implement streaming for bulk read requests
> --
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11525) StaticTokenTreeBuilder should respect posibility of duplicate tokens

2016-04-07 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11525:

 Reviewer: Pavel Yaskevich
Fix Version/s: 3.5
  Summary: StaticTokenTreeBuilder should respect posibility of 
duplicate tokens  (was: SASI index corruption)

> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> 
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 

[jira] [Updated] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11525:

Assignee: Jordan West

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>Assignee: Jordan West
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>  

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231400#comment-15231400
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Ok, as a quick update, I think we know what is going on here, [~jrwest] is 
working on the changes to TokenTree, and it's most definitely caused by changes 
in CASSANDRA-11383. We still going to use your files to validate plus add 
additional tests to prevent this in the future.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231348#comment-15231348
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Sounds good, [~doanduyhai]! Meanwhile we are trying to reproduce based on what 
we can figure out theoretically.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Commented] (CASSANDRA-10119) CQLSSTableWriter does not add the Keyspace or Tablename as a part of the file name

2016-04-07 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231343#comment-15231343
 ] 

Wei Deng commented on CASSANDRA-10119:
--

I stumbled upon this JIRA. Just for future reference in case any one reading 
this JIRA, the behavior change happened in CASSANDRA-6962.

> CQLSSTableWriter does not add the Keyspace or Tablename as a part of the file 
> name
> --
>
> Key: CASSANDRA-10119
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10119
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Windows 7 64bit
>Reporter: Matthew Mulligan
>
> When using the CQLSSTableWriter to create sstable files from a CSV the files 
> that it outputs does not have the keyspace or table name as a part of the 
> file name. When I tried to load the files using sstableloader I received the 
> following stack trace.
> {noformat}
> $ sstableloader -d localhost datawh/line_items
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> For input string: “TOC.txt”
> java.lang.NumberFormatException: For input string: “TOC.txt”
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:492)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:276)
> at 
> org.apache.cassandra.io.sstable.Descriptor.fromFilename(Descriptor.java:235)
> at org.apache.cassandra.io.sstable.Component.fromFilename(Component.java:120)
> at 
> org.apache.cassandra.io.sstable.SSTable.tryComponentFromFilename(SSTable.java:160)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:84)
> at java.io.File.list(File.java:1155)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:78)
> at 
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:162)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
> {noformat}
> The files looked like this: la-1-big-Data.db
> They should look like this: datawh-line_items-la-1-Data.db
> Once I changed the filenames the sstableloader worked as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231340#comment-15231340
 ] 

DOAN DuyHai commented on CASSANDRA-11525:
-

Ok I'm uploading to 
https://drive.google.com/folderview?id=0B6wR2aj4Cb6wYWdfcl9Pb05CYW8=sharing

There are 9 SSTables and 9 index files

The SSTable size sums up to more than 20Gb

It will take 2h at least for the upload to finish

I will rebuild the index with 3.4 tomorrow and re-test.


> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231220#comment-15231220
 ] 

Pavel Yaskevich edited comment on CASSANDRA-11525 at 4/7/16 10:20 PM:
--

[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

Edit: Meanwhile it would be great if you could test it on 3.4 and see if that 
produces the same error too.


was (Author: xedin):
[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231220#comment-15231220
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

[~doanduyhai] Alright, it's most likely is related to how the index is stitched 
together again, we'll wait for you to upload files.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231212#comment-15231212
 ] 

DOAN DuyHai commented on CASSANDRA-11525:
-

Ok your intuition was correct [~xedin], I did this code patch :

{noformat}
key = decorateKey(ByteBufferUtil.readWithShortLength(in));

// hint read path about key location if caching is enabled
// this saves index summary lookup and index file iteration which 
whould be pretty costly
// especially in presence of promoted column indexes
try
{

if (isKeyCacheSetup())
cacheKey(key, rowIndexEntrySerializer.deserialize(in));
} catch (IndexOutOfBoundsException ex)
{

try {
final String keyValue = 
keyValidator.getString(key.getKey().duplicate());
logger.error(String.format(
"Error when reading index entry for token '%s', partition 
key '%s' at indexPosition %s ",
key.getToken().getTokenValue(), keyValue, indexPosition));
throw ex;
} catch (Exception ex2)
{
logger.error(String.format(
"Error when reading index entry for token '%s' at 
indexPosition %s ",
key.getToken().getTokenValue(), indexPosition));
throw ex;
}
}
{noformat}

 And it seems that we're falling into the second exception catch block, which 
is a hint that the deserialization of the partition key ByteBuffer may have 
thrown an exception as well so yes, very probable that it's just a random set 
of 32k bytes

I'm going to fetch the SSTables and index structures to upload them.

By the way, I tested with trunk (3.6-SNAPSHOT) and same error too



> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   

[jira] [Commented] (CASSANDRA-11430) Add legacy notifications backward-support on deprecated repair methods

2016-04-07 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231166#comment-15231166
 ] 

Yuki Morishita commented on CASSANDRA-11430:


Created CASSANDRA-11530 for removing deprecated.

> Add legacy notifications backward-support on deprecated repair methods
> --
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Paulo Motta
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11529:

Priority: Critical  (was: Major)

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11529:

Fix Version/s: 3.x
   3.0.x
   2.2.x
   2.1.x

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>Priority: Critical
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231202#comment-15231202
 ] 

Russell Bradberry commented on CASSANDRA-11529:
---

This is a critical issue for us as our cluster is in a mixed-version state 
where we have coordinator-only nodes running an older version to compensate for 
this issue.  The impact on a 50 node (8 cores, 256 vnodes) cluster with a few 
thousand batch inserts per second sends the average load to above 120.

> Checking if an unlogged batch is local is inefficient
> -
>
> Key: CASSANDRA-11529
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Paulo Motta
>
> Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we 
> introduced the following check to avoid printing a {{WARN}} in case an 
> unlogged batch statement is local:
> {noformat}
>  for (IMutation im : mutations)
>  {
>  keySet.add(im.key());
>  for (ColumnFamily cf : im.getColumnFamilies())
>  ksCfPairs.add(String.format("%s.%s", 
> cf.metadata().ksName, cf.metadata().cfName));
> +
> +if (localMutationsOnly)
> +localMutationsOnly &= isMutationLocal(localTokensByKs, 
> im);
>  }
>  
> +// CASSANDRA-9303: If we only have local mutations we do not warn
> +if (localMutationsOnly)
> +return;
> +
>  NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.MINUTES, unloggedBatchWarning,
>   keySet.size(), keySet.size() == 1 ? "" : "s",
>   ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
> {noformat}
> The {{isMutationLocal}} check uses 
> {{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
> underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
> ranges. 
> Recalculating this at every unlogged batch can be pretty inefficient, so we 
> should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231171#comment-15231171
 ] 

Nate McCall commented on CASSANDRA-11363:
-

CASSANDRA-11529 makes sense for [~devdazed], but the numbers from [~arodrime] 
are on a *non-vnode* cluster. 

Keep in mind: 
- we actually get this metric to go down a bit by increasing native transport 
threads
- we are not CPU bound or spiking (noticeably at least)

If this was an inefficiency in a hot code path per CASSANDRA-11529, I feel like 
increasing parallelism would exacerbate the issue. 

Good thoughts though - thanks for digging in!

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11530) Remove deprecated repair method in 4.0

2016-04-07 Thread Yuki Morishita (JIRA)
Yuki Morishita created CASSANDRA-11530:
--

 Summary: Remove deprecated repair method in 4.0
 Key: CASSANDRA-11530
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11530
 Project: Cassandra
  Issue Type: Task
Reporter: Yuki Morishita
Priority: Minor
 Fix For: 4.x


Once we hit 4.0, we should remove all deprecated repair JMX API.
(nodetool has been using only new API since it is introduced.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11430) Add legacy notifications backward-support on deprecated repair methods

2016-04-07 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231162#comment-15231162
 ] 

Yuki Morishita commented on CASSANDRA-11430:


Overall looks good, but in 2.2 you cannot use java 8 feature like {{Optional}} 
yet. 3.0+ it's fine.
That's why your dtest is failing on 2.2.

Can you create different patch for that?

> Add legacy notifications backward-support on deprecated repair methods
> --
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Paulo Motta
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231130#comment-15231130
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

[~pauloricardomg] 11529 makes sense because CASSANDRA-9303 was backported to 
2.1.12 in DSE 4.8.4. Hence why we see it in that version vs only in 2.1.13.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231130#comment-15231130
 ] 

Russell Bradberry edited comment on CASSANDRA-11363 at 4/7/16 9:42 PM:
---

[~pauloricardomg] CASSANDRA-11529 makes sense because CASSANDRA-9303 was 
backported to 2.1.12 in DSE 4.8.4. Hence why we see it in that version vs only 
in 2.1.13.


was (Author: devdazed):
[~pauloricardomg] 11529 makes sense because CASSANDRA-9303 was backported to 
2.1.12 in DSE 4.8.4. Hence why we see it in that version vs only in 2.1.13.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231123#comment-15231123
 ] 

Paulo Motta edited comment on CASSANDRA-11363 at 4/7/16 9:41 PM:
-

yes, cluster with more vnodes will be more affected by that.


was (Author: pauloricardomg):
yes, cluster with more vnodes will be more affected by this.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231123#comment-15231123
 ] 

Paulo Motta commented on CASSANDRA-11363:
-

yes, cluster with more vnodes will be more affected by this.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-07 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224998#comment-15224998
 ] 

Joshua McKenzie edited comment on CASSANDRA-8844 at 4/7/16 9:40 PM:


v1 is ready for review.

h5. General outline of changes in the patch
* CQL syntax changes to support CDC:
** CREATE KEYSPACE ks WITH replication... AND cdc_datacenters={'dc1','dc2'...}
** ALTER KEYSPACE ks DROP CDCLOG;
*** Cannot drop keyspaces w/CDC enabled without first disabling CDC.
* Changes to Parser.g to support sets being converted into maps. Reference 
normalizeSetOrMapLiteral, cleanMap, cleanSet
* Statement changes to support new keyspace param for Option.CDC_DATACENTERS
* Refactored {{CommitLogReplayer}} into {{CommitLogReplayer}}, 
{{CommitLogReader}}, and {{ICommitLogReadHandler}} in preparation for having a 
CDC consumer that needs to read commit log segments.
* Refactored commit log versioned deltas from various read* methods into 
{{CommitLogReader.CommitLogFormat}}
* Renamed {{ReplayPosition}} to {{CommitLogSegmentPosition}} (this is 
responsible for quite a bit of noise in the diff - sorry)
* Refactored {{CommitLogSegmentManager}} into:
** {{AbstractCommitLogSegmentManager}}
** {{CommitLogSegmentManagerStandard}}
*** Old logic for alloc (always succeed, block on allocate)
*** discard (delete if true)
*** unusedCapacity check (CL directory only)
** {{CommitLogSegmentManagerCDC}}
*** Fail alloc if atCapacity. We have an extra couple of atomic checks on the 
critical path for CDC-enabled (size + cdc overflow) and fail allocation if 
we're at limit. CommitLog now throws WriteTimeoutException for allocations 
returned null from CommitLog, which the standard should never do as it infinite 
loops in {{advanceAllocatingFrom}}.
*** Move files to cdc overflow folder as configured in yaml on discard
*** unusedCapacity includes lazy calculated size of CDC overflow as well. See 
DirectorySizerBench.java for why I went w/separate thread to lazy calculate 
size of overflow instead of doing it sync on failed allocation
*** Separate size limit configured in cassandra.yaml for CDC and CommitLog so 
they each have their own unusedCapacity checks. Went with 1/8th disk or 4096 on 
CDC as default, putting it at 1/2 the size of CommitLog.
* Refactored buffer management portions of {{FileDirectSegment}} into 
{{SimpleCachedBufferPool}}, owned by a {{CommitLogSegmentManager}} instance
** There's considerable logical overlap between this and BufferPool in general, 
though this is considerably simpler and purpose-built. I'm personally ok 
leaving it separate for now given it's simplicity.
* Some other various changes and movements around the code-base related to this 
patch ({{DirectorySizeCalculator}}, some javadoccing, typos I came across in 
comments or variable names while working on this, etc)

h5. What's not yet done:
* Consider running all / relevant CommitLog related unit tests against a 
CDC-based keyspace
* Performance testing (want to confirm that added determination of which 
{{CommitLogSegmentManager}} during write path is negligable impact along w/2 
atomic checks on CDC write-path)
* dtests specific to CDC
* fallout testing on CDC
* Any code-changes to specifically target supporting a consumer following a CDC 
log as it's being written in CommitLogReader / ICommitLogReader. A requester 
should be able to trivially handle that with the 
{{CommitLogReader.readCommitLogSegment}} signature supporting 
{{CommitLogSegmentPosition}} and {{mutationLimit}}, however, so I'm leaning 
towards not further polluting CommitLogReader / C* and keeping that in the 
scope of a consumption daemon

h5. Special point of concern:
* This patch changes us from an implicit singleton view of 
{{CommitLogSegmentManager}} to having multiple CommitLogSegmentManagers managed 
under the CommitLog. There have been quite a few places where I've come across 
undocumented assumptions that we only ever have 1 logical object allocating 
segments (the latest being FileDirectSegment uncovered by 
CommitLogSegmentManagerTest). I plan on again checking the code to make sure 
the new "calculate off multiple segment managers" view of some of the things 
exposed in the CommitLog interface don't violate their contract now that 
there's no longer single CLSM-atomicity on those results.

h5. Known issues:
* dtest is showing a pretty consistent error w/an inability to find a cdc 
CommitLogSegment during recovery that looks to be unique to the dtest env
* a few failures left in testall
* intermittent failure in the new {{CommitLogSegmentManagerCDCTest}} (3/150 
runs - on Windows, so I haven't yet ruled out an env. issue w/the testing)

[~blambov]: while [~carlyeks] is primary reviewer on this and quite familiar 
with the changes as he worked w/me on the design process, I'd also appreciate 
it if you could provide a backup pair of eyes and look over the 

[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231115#comment-15231115
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

any chance the number of vnodes in a cluster affects how bad this issue is?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231113#comment-15231113
 ] 

Paulo Motta edited comment on CASSANDRA-11363 at 4/7/16 9:35 PM:
-

[~devdazed] Ok, then it's very likely that you're hit by CASSANDRA-11529. I 
created another ticket in case this one is a different issue.

[~zznate] [~CRolo] can you double check you're not hitting CASSANDRA-11529?


was (Author: pauloricardomg):
Ok, then it's very likely that you're hit by CASSANDRA-11529. I created another 
ticket in case this one is a different issue.

[~zznate] [~CRolo] can you double check you're not hitting CASSANDRA-11529?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231113#comment-15231113
 ] 

Paulo Motta commented on CASSANDRA-11363:
-

Ok, then it's very likely that you're hit by CASSANDRA-11529. I created another 
ticket in case this one is a different issue.

[~zznate] [~CRolo] can you double check you're not hitting CASSANDRA-11529?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11529) Checking if an unlogged batch is local is inefficient

2016-04-07 Thread Paulo Motta (JIRA)
Paulo Motta created CASSANDRA-11529:
---

 Summary: Checking if an unlogged batch is local is inefficient
 Key: CASSANDRA-11529
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11529
 Project: Cassandra
  Issue Type: Bug
  Components: Coordination
Reporter: Paulo Motta


Based on CASSANDRA-11363 report I noticed that on CASSANDRA-9303 we introduced 
the following check to avoid printing a {{WARN}} in case an unlogged batch 
statement is local:

{noformat}
 for (IMutation im : mutations)
 {
 keySet.add(im.key());
 for (ColumnFamily cf : im.getColumnFamilies())
 ksCfPairs.add(String.format("%s.%s", cf.metadata().ksName, 
cf.metadata().cfName));
+
+if (localMutationsOnly)
+localMutationsOnly &= isMutationLocal(localTokensByKs, im);
 }
 
+// CASSANDRA-9303: If we only have local mutations we do not warn
+if (localMutationsOnly)
+return;
+
 NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
TimeUnit.MINUTES, unloggedBatchWarning,
  keySet.size(), keySet.size() == 1 ? "" : "s",
  ksCfPairs.size() == 1 ? "" : "s", ksCfPairs);
{noformat}

The {{isMutationLocal}} check uses 
{{StorageService.instance.getLocalRanges(mutation.getKeyspaceName())}}, which 
underneaths uses {{AbstractReplication.getAddressRanges}} to calculate local 
ranges. 

Recalculating this at every unlogged batch can be pretty inefficient, so we 
should at the very least cache it every time the ring changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231108#comment-15231108
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

[~pauloricardomg] yes, we are using unlogged batches that cross partitions

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231095#comment-15231095
 ] 

Paulo Motta commented on CASSANDRA-11363:
-

[~devdazed] are you using unlogged batches by any chance?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11295) Make custom filtering more extensible via custom classes

2016-04-07 Thread Henry Manasseh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231089#comment-15231089
 ] 

Henry Manasseh commented on CASSANDRA-11295:


This looks useful for a query I am implementing. I looked at the code for 
TestQueryHandler and have some questions.

1. How do you access the RowFilter from within the QueryHandler so that I could 
call rowFilter.addUserExpression(UserExpression e) to register my custom 
UserExpression?
2. Is a custom payload the only way to pass inputs to the user expression at 
the moment (until there is a CQL syntax)?

If you have a sample you used for development/testing... would it be possible 
to post to either Jira, github or via email? Thank you for any tips.


> Make custom filtering more extensible via custom classes 
> -
>
> Key: CASSANDRA-11295
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11295
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.6
>
>
> At the moment, the implementation of {{RowFilter.CustomExpression}} is 
> tightly bound to the syntax designed to support non-CQL search syntax for 
> custom 2i implementations. It might be interesting to decouple the two things 
> by making the custom expression implementation and serialization a bit more 
> pluggable. This would allow users to add their own custom expression 
> implementations to experiment with custom filtering strategies without having 
> to patch the C* source. As a minimally invasive first step, custom 
> expressions could be added programmatically via {{QueryHandler}}. Further 
> down the line, if this proves useful and we can figure out some reasonable 
> syntax we could think about adding the capability in CQL in a separate 
> ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11528) Server Crash when select returns more than a few hundred rows

2016-04-07 Thread Mattias W (JIRA)
Mattias W created CASSANDRA-11528:
-

 Summary: Server Crash when select returns more than a few hundred 
rows
 Key: CASSANDRA-11528
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11528
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: windows 7, 8 GB machine
Reporter: Mattias W
 Fix For: 3.3
 Attachments: datastax_ddc_server-stdout.2016-04-07.log

While implementing a dump procedure, which did "select * from" from one table 
at a row, I instantly kill the server. A simple "select count(*) from"  also 
kills it. For a while, I thought the size of blobs were the cause

I also try to only have a unique id as partition key, I was afraid a single 
partition got too big or so, but that didn't change anything

It happens every time, both from Java/Clojure and from DevCenter.

I looked at the logs at C:\Program Files\DataStax-DDC\logs, but the crash is so 
quick, so nothing is recorded there.

There is a Java-out-of-memory in the logs, but that isn't from the time of the 
crash.

It only happens for one table, it only has 15000 entries, but there are blobs 
and byte[] stored there, size between 100kb - 4Mb. Total size for that table is 
about 6.5 GB on disk.

I made a workaround by doing many small selects instead, each only fetching 100 
rows.

Is there a setting a can set to make the system log more eagerly, in order to 
at least get a stacktrace or similar, that might help you.

It is the prun_srv that dies. Restarting the NT service makes Cassandra run 
again



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11295) Make custom filtering more extensible via custom classes

2016-04-07 Thread Henry Manasseh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231071#comment-15231071
 ] 

Henry Manasseh commented on CASSANDRA-11295:


For the a CQL syntax, how about using a syntax similar to a UDF?

SELECT * FROM ks.t1 WHERE my_custom_filter('some arbitrary constraint to be 
applied by custom filter');




> Make custom filtering more extensible via custom classes 
> -
>
> Key: CASSANDRA-11295
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11295
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.6
>
>
> At the moment, the implementation of {{RowFilter.CustomExpression}} is 
> tightly bound to the syntax designed to support non-CQL search syntax for 
> custom 2i implementations. It might be interesting to decouple the two things 
> by making the custom expression implementation and serialization a bit more 
> pluggable. This would allow users to add their own custom expression 
> implementations to experiment with custom filtering strategies without having 
> to patch the C* source. As a minimally invasive first step, custom 
> expressions could be added programmatically via {{QueryHandler}}. Further 
> down the line, if this proves useful and we can figure out some reasonable 
> syntax we could think about adding the capability in CQL in a separate 
> ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231063#comment-15231063
 ] 

Nate McCall edited comment on CASSANDRA-11363 at 4/7/16 9:06 PM:
-

Yeah, that is quite different that what [~arodrime] and I have seen recently. 
Nodes in our case were otherwise well within utilization thresholds.

[~devdazed] Your issue looks like it would be addressed by [~cnlwsu]'s 
reference to CASSANDRA-10200 (lean on DSE folks for a patch). 


was (Author: zznate):
Yeah, that is quite different that what [~arodrime] and I have seen recently. 
Nodes in our case were otherwise well within utilization thresholds.

[~devdazed] Your issue looks like it would be addressed by [~cnlwsu]'s 
reference to CASSANDRA-10200 (lean on them for a patch). 

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 

[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231066#comment-15231066
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

well, it may be possible that this is the same issue, just very much 
exacerbated by the batches

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231063#comment-15231063
 ] 

Nate McCall commented on CASSANDRA-11363:
-

Yeah, that is quite different that what [~arodrime] and I have seen recently. 
Nodes in our case were otherwise well within utilization thresholds.

[~devdazed] Your issue looks like it would be addressed by [~cnlwsu]'s 
reference to CASSANDRA-10200 (lean on them for a patch). 

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11458) Complete support for CustomExpression

2016-04-07 Thread Henry Manasseh (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Manasseh resolved CASSANDRA-11458.

Resolution: Won't Fix

CASSANDRA-11295 provides new support for UserExpressions.

> Complete support for CustomExpression
> -
>
> Key: CASSANDRA-11458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11458
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Henry Manasseh
>Priority: Minor
> Attachments: Custom-expression-Change.png, addCustomIndexExpression 
> change.png
>
>
> This is a proposal to complete the CustomExpression support first introduced 
> as part of https://issues.apache.org/jira/browse/CASSANDRA-10217.
> The current support for custom expressions is partial. There is no clean way 
> to implement queries making use of the "exp('index', 'value)" syntax due to 
> the fact CustomExpression is declared as final and there is no way to for 
> developers to cleanly plug-in their own expressions.
> https://github.com/apache/cassandra/blob/6e69c75900f3640195130085ad69daa1659184eb/src/java/org/apache/cassandra/db/filter/RowFilter.java#L869
> The proposal is to make CustomExpression not final so that developers can 
> extend and create their own subclass and provide their own isSatisfiedBy 
> operation (which currently always returns true).
> Introducing a new custom expression would be done as follows:
> 1. Developer would create a subclass of CustomExpression and override 
> isSatisfiedBy method with their logic (public boolean 
> isSatisfiedBy(CFMetaData metadata, DecoratedKey partitionKey, Row row))
> 2. This class would be packaged in a jar and copied to the cassandra lib 
> directory along with a secondary index class which overrides 
> Index.customExpressionValueType
> 2. Create the custom index with an option which identifies the 
> CustomExpression subclass (custom_expression_class).
> CREATE CUSTOM INDEX ON keyspace.my_table(my_indexed_column) USING 
> 'org.custom.MyCustomIndex'
> WITH OPTIONS = { 'custom_expression_class': 'org.custom.MyCustomExpression' };
> I have prototyped the change and works as designed. In my case I do a type of 
> "IN" query filter which will simplify my client logic significantly.
> The default behavior of using the CustomExpression class would be maintained 
> if the developer does not provide a custom class in the create index options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11458) Complete support for CustomExpression

2016-04-07 Thread Henry Manasseh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230988#comment-15230988
 ] 

Henry Manasseh edited comment on CASSANDRA-11458 at 4/7/16 8:57 PM:


I think your new feature CASSANDRA-11295 provides what I need. I already have a 
dummy custom index and a hack which removes the custom expression and injects 
my own expression subclass which actually works... but I'll plan on 
re-implementing based on the new UserExpression class.



was (Author: henryman):
I think your new feature CASSANDRA-11295 provides what I need. I already have a 
dummy custom index and a hack which removes the custom expression and injects 
my own expression subclass which actually works... but I'll plan on 
re-implementing based on the new UserExpression class.

I just need to figure out how to add my user expression subclass to the  
RowFilter programmatically via a QueryHandler implementation. [~beobal] Do you 
have any pointers to code for a sample UserExpression and QueryHandler 
registration code I could use as a starting point? Thank you for any tips.

> Complete support for CustomExpression
> -
>
> Key: CASSANDRA-11458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11458
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Henry Manasseh
>Priority: Minor
> Attachments: Custom-expression-Change.png, addCustomIndexExpression 
> change.png
>
>
> This is a proposal to complete the CustomExpression support first introduced 
> as part of https://issues.apache.org/jira/browse/CASSANDRA-10217.
> The current support for custom expressions is partial. There is no clean way 
> to implement queries making use of the "exp('index', 'value)" syntax due to 
> the fact CustomExpression is declared as final and there is no way to for 
> developers to cleanly plug-in their own expressions.
> https://github.com/apache/cassandra/blob/6e69c75900f3640195130085ad69daa1659184eb/src/java/org/apache/cassandra/db/filter/RowFilter.java#L869
> The proposal is to make CustomExpression not final so that developers can 
> extend and create their own subclass and provide their own isSatisfiedBy 
> operation (which currently always returns true).
> Introducing a new custom expression would be done as follows:
> 1. Developer would create a subclass of CustomExpression and override 
> isSatisfiedBy method with their logic (public boolean 
> isSatisfiedBy(CFMetaData metadata, DecoratedKey partitionKey, Row row))
> 2. This class would be packaged in a jar and copied to the cassandra lib 
> directory along with a secondary index class which overrides 
> Index.customExpressionValueType
> 2. Create the custom index with an option which identifies the 
> CustomExpression subclass (custom_expression_class).
> CREATE CUSTOM INDEX ON keyspace.my_table(my_indexed_column) USING 
> 'org.custom.MyCustomIndex'
> WITH OPTIONS = { 'custom_expression_class': 'org.custom.MyCustomExpression' };
> I have prototyped the change and works as designed. In my case I do a type of 
> "IN" query filter which will simplify my client logic significantly.
> The default behavior of using the CustomExpression class would be maintained 
> if the developer does not provide a custom class in the create index options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231045#comment-15231045
 ] 

Russell Bradberry edited comment on CASSANDRA-11363 at 4/7/16 8:56 PM:
---

we may have two separate issues here, in mine, the issue is 100% CPU 
utilization and ultra high load when using batches.  According to the jfr all 
of hot threads are spinning on 
{code}
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(String,
 Map, Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(Map, 
Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(Token,
 TokenMetadata)
{code}


was (Author: devdazed):
we may have two separate issues here, in mine, the issue is 100% CPU 
utilization and ultra high load when using batches.  According to the jfr all 
of hot threads are putting their resources in 
{code}
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(String,
 Map, Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(Map, 
Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(Token,
 TokenMetadata)
{code}

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> 

[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231045#comment-15231045
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

we may have two separate issues here, in mine, the issue is 100% CPU 
utilization and ultra high load when using batches.  According to the jfr all 
of hot threads are putting their resources in 
{code}
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(String,
 Map, Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.hasSufficientReplicas(Map, 
Multimap)
-> 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(Token,
 TokenMetadata)
{code}

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231025#comment-15231025
 ] 

Paulo Motta edited comment on CASSANDRA-11363 at 4/7/16 8:44 PM:
-

that's right, that was just a wild guess. I focused my quick investigation on 
2.1.12 and 2.1.13 commits, but some deeper investigation is probably needed.

probably it's a good idea to backport CASSANDRA-10044 to versions < 2.1.12 and 
setup a coordinator-only node and check where it started happening to narrow 
down the scope.


was (Author: pauloricardomg):
that's right, that was just a wild guess. I focused my quick investigation on 
2.1.12 and 2.1.13 commits, but some deeper investigation is probably needed.

probably it's a good idea to backport CASSANDRA-10044 to versions <= 2.1.12 and 
setup a coordinator-only node and check where it started happening to narrow 
down the scope.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the 

[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231025#comment-15231025
 ] 

Paulo Motta commented on CASSANDRA-11363:
-

that's right, that was just a wild guess. I focused my quick investigation on 
2.1.12 and 2.1.13 commits, but some deeper investigation is probably needed.

probably it's a good idea to backport CASSANDRA-10044 to versions <= 2.1.12 and 
setup a coordinator-only node and check where it started happening to narrow 
down the scope.

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11458) Complete support for CustomExpression

2016-04-07 Thread Henry Manasseh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230988#comment-15230988
 ] 

Henry Manasseh commented on CASSANDRA-11458:


I think your new feature CASSANDRA-11295 provides what I need. I already have a 
dummy custom index and a hack which removes the custom expression and injects 
my own expression subclass which actually works... but I'll plan on 
re-implementing based on the new UserExpression class.

I just need to figure out how to add my user expression subclass to the  
RowFilter programmatically via a QueryHandler implementation. [~beobal] Do you 
have any pointers to code for a sample UserExpression and QueryHandler 
registration code I could use as a starting point? Thank you for any tips.

> Complete support for CustomExpression
> -
>
> Key: CASSANDRA-11458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11458
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Henry Manasseh
>Priority: Minor
> Attachments: Custom-expression-Change.png, addCustomIndexExpression 
> change.png
>
>
> This is a proposal to complete the CustomExpression support first introduced 
> as part of https://issues.apache.org/jira/browse/CASSANDRA-10217.
> The current support for custom expressions is partial. There is no clean way 
> to implement queries making use of the "exp('index', 'value)" syntax due to 
> the fact CustomExpression is declared as final and there is no way to for 
> developers to cleanly plug-in their own expressions.
> https://github.com/apache/cassandra/blob/6e69c75900f3640195130085ad69daa1659184eb/src/java/org/apache/cassandra/db/filter/RowFilter.java#L869
> The proposal is to make CustomExpression not final so that developers can 
> extend and create their own subclass and provide their own isSatisfiedBy 
> operation (which currently always returns true).
> Introducing a new custom expression would be done as follows:
> 1. Developer would create a subclass of CustomExpression and override 
> isSatisfiedBy method with their logic (public boolean 
> isSatisfiedBy(CFMetaData metadata, DecoratedKey partitionKey, Row row))
> 2. This class would be packaged in a jar and copied to the cassandra lib 
> directory along with a secondary index class which overrides 
> Index.customExpressionValueType
> 2. Create the custom index with an option which identifies the 
> CustomExpression subclass (custom_expression_class).
> CREATE CUSTOM INDEX ON keyspace.my_table(my_indexed_column) USING 
> 'org.custom.MyCustomIndex'
> WITH OPTIONS = { 'custom_expression_class': 'org.custom.MyCustomExpression' };
> I have prototyped the change and works as designed. In my case I do a type of 
> "IN" query filter which will simplify my client logic significantly.
> The default behavior of using the CustomExpression class would be maintained 
> if the developer does not provide a custom class in the create index options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11458) Complete support for CustomExpression

2016-04-07 Thread Henry Manasseh (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230960#comment-15230960
 ] 

Henry Manasseh commented on CASSANDRA-11458:


Sorry for the delayed response. I was out of town.

Yes, what I really need is a custom filter (and not a secondary index) but I 
was only able to add it by creating a dummy secondary index. I will check out 
the changes from CASSANDRA-11295 later this week. It seems like it may have 
what I need. Thank you [~beobal]



> Complete support for CustomExpression
> -
>
> Key: CASSANDRA-11458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11458
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Henry Manasseh
>Priority: Minor
> Attachments: Custom-expression-Change.png, addCustomIndexExpression 
> change.png
>
>
> This is a proposal to complete the CustomExpression support first introduced 
> as part of https://issues.apache.org/jira/browse/CASSANDRA-10217.
> The current support for custom expressions is partial. There is no clean way 
> to implement queries making use of the "exp('index', 'value)" syntax due to 
> the fact CustomExpression is declared as final and there is no way to for 
> developers to cleanly plug-in their own expressions.
> https://github.com/apache/cassandra/blob/6e69c75900f3640195130085ad69daa1659184eb/src/java/org/apache/cassandra/db/filter/RowFilter.java#L869
> The proposal is to make CustomExpression not final so that developers can 
> extend and create their own subclass and provide their own isSatisfiedBy 
> operation (which currently always returns true).
> Introducing a new custom expression would be done as follows:
> 1. Developer would create a subclass of CustomExpression and override 
> isSatisfiedBy method with their logic (public boolean 
> isSatisfiedBy(CFMetaData metadata, DecoratedKey partitionKey, Row row))
> 2. This class would be packaged in a jar and copied to the cassandra lib 
> directory along with a secondary index class which overrides 
> Index.customExpressionValueType
> 2. Create the custom index with an option which identifies the 
> CustomExpression subclass (custom_expression_class).
> CREATE CUSTOM INDEX ON keyspace.my_table(my_indexed_column) USING 
> 'org.custom.MyCustomIndex'
> WITH OPTIONS = { 'custom_expression_class': 'org.custom.MyCustomExpression' };
> I have prototyped the change and works as designed. In my case I do a type of 
> "IN" query filter which will simplify my client logic significantly.
> The default behavior of using the CustomExpression class would be maintained 
> if the developer does not provide a custom class in the create index options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230947#comment-15230947
 ] 

Nate McCall commented on CASSANDRA-11363:
-

[~pauloricardomg] unfortunately, this may be been latent for some time. 
CASSANDRA-10044 re-introduced the tpstats counters as they had been missing 
(which is mostly likely why this has not been noticed until recently). 

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Russell Bradberry (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230926#comment-15230926
 ] 

Russell Bradberry commented on CASSANDRA-11363:
---

Unfortunately we are on DSE, so I can't run the revert

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230924#comment-15230924
 ] 

Paulo Motta commented on CASSANDRA-11363:
-

I went through 2.1.12 changes and didn't find anything suspicious. On 2.1.13 
and 3.0.3 though, we changed the {{ServerConnection}} query state map from a 
{{NonBlockingHashMap}} to a {{ConcurrentHashMap}} on CASSANDRA-10938, which 
might be misbehaving for some reason.

Is anyone willing to try the revert patch below on 2.1.13 or 3.0.3 and check if 
that changes anything?

{noformat}
diff --git a/src/java/org/apache/cassandra/transport/ServerConnection.java 
b/src/java/org/apache/cassandra/transport/ServerConnection.java
index ce4d164..5991b33 100644
--- a/src/java/org/apache/cassandra/transport/ServerConnection.java
+++ b/src/java/org/apache/cassandra/transport/ServerConnection.java
@@ -17,7 +17,6 @@
  */
 package org.apache.cassandra.transport;
 
-import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 
 import io.netty.channel.Channel;
@@ -29,6 +28,8 @@ import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.service.QueryState;
 
+import org.cliffc.high_scale_lib.NonBlockingHashMap;
+
 public class ServerConnection extends Connection
 {
 private enum State { UNINITIALIZED, AUTHENTICATION, READY }
@@ -37,7 +38,7 @@ public class ServerConnection extends Connection
 private final ClientState clientState;
 private volatile State state;
 
-private final ConcurrentMap queryStates = new 
ConcurrentHashMap<>();
+private final ConcurrentMap queryStates = new 
NonBlockingHashMap<>();
 
 public ServerConnection(Channel channel, int version, Connection.Tracker 
tracker)
 {
{noformat}

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94   

[jira] [Commented] (CASSANDRA-11514) trunk compaction performance regression

2016-04-07 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230918#comment-15230918
 ] 

Michael Shuler commented on CASSANDRA-11514:


Larger run on DTCS 
[trunk_3.5_3.0-compaction_DTCS|http://cstar.datastax.com/tests/artifacts/c9dd18fc-fcd7-11e5-8f8b-0256e416528f/graph]
 seems to support that trunk and 3.5 both have this commit. The differences 
seem pretty small, so it could be difficult to determine a difference between 2 
commits.

> trunk compaction performance regression
> ---
>
> Key: CASSANDRA-11514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: cstar_perf
>Reporter: Michael Shuler
>  Labels: performance
> Fix For: 3.x
>
> Attachments: trunk-compaction_dtcs-op_rate.png, 
> trunk-compaction_lcs-op_rate.png
>
>
> It appears that a commit between Mar 29-30 has resulted in a drop in 
> compaction performance. I attempted to get a log list of commits to post 
> here, but
> {noformat}
> git log trunk@{2016-03-29}..trunk@{2016-03-31}
> {noformat}
> appears to be incomplete, since reading through {{git log}} I see netty and 
> och were upgraded during this time period.
> !trunk-compaction_dtcs-op_rate.png!
> !trunk-compaction_lcs-op_rate.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11310) Allow filtering on clustering columns for queries without secondary indexes

2016-04-07 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230915#comment-15230915
 ] 

Alex Petrov commented on CASSANDRA-11310:
-

And the test results:

|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-dtest/]|
 

> Allow filtering on clustering columns for queries without secondary indexes
> ---
>
> Key: CASSANDRA-11310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Benjamin Lerer
>Assignee: Alex Petrov
>  Labels: doc-impacting
> Fix For: 3.x
>
>
> Since CASSANDRA-6377 queries without index filtering non-primary key columns 
> are fully supported.
> It makes sense to also support filtering on clustering-columns.
> {code}
> CREATE TABLE emp_table2 (
> empID int,
> firstname text,
> lastname text,
> b_mon text,
> b_day text,
> b_yr text,
> PRIMARY KEY (empID, b_yr, b_mon, b_day));
> SELECT b_mon,b_day,b_yr,firstname,lastname FROM emp_table2
> WHERE b_mon='oct' ALLOW FILTERING;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11310) Allow filtering on clustering columns for queries without secondary indexes

2016-04-07 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230915#comment-15230915
 ] 

Alex Petrov edited comment on CASSANDRA-11310 at 4/7/16 7:32 PM:
-

And the test results:

|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-dtest/]|
 


was (Author: ifesdjeen):
And the test results:

|[testall|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-11310-trunk-dtest/]|
 

> Allow filtering on clustering columns for queries without secondary indexes
> ---
>
> Key: CASSANDRA-11310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Benjamin Lerer
>Assignee: Alex Petrov
>  Labels: doc-impacting
> Fix For: 3.x
>
>
> Since CASSANDRA-6377 queries without index filtering non-primary key columns 
> are fully supported.
> It makes sense to also support filtering on clustering-columns.
> {code}
> CREATE TABLE emp_table2 (
> empID int,
> firstname text,
> lastname text,
> b_mon text,
> b_day text,
> b_yr text,
> PRIMARY KEY (empID, b_yr, b_mon, b_day));
> SELECT b_mon,b_day,b_yr,firstname,lastname FROM emp_table2
> WHERE b_mon='oct' ALLOW FILTERING;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8844) Change Data Capture (CDC)

2016-04-07 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230845#comment-15230845
 ] 

Joshua McKenzie commented on CASSANDRA-8844:


Added link to PR on ccm to fix cdc directory pathing on nodes in ccm cluster. 
Going to re-run dtests w/that branch shortly.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A sophisticated daemon should be able to be written that could 
> -- Catch up, in written-order, even when it is multiple logfiles behind in 
> processing
> -- Be able to continuously "tail" the most recent logfile and get 
> low-latency(ms?) access to the data as it is written.
> h2. Alternate approach
> In order to make consuming a change log easy and efficient to do with low 
> latency, the following could supplement the approach outlined above

[jira] [Updated] (CASSANDRA-10134) Always require replace_address to replace existing address

2016-04-07 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-10134:

Status: Patch Available  (was: In Progress)

Sorry this has dragged on, I got bogged down with a few other things...

The linked branch modifies {{StorageService::prepareToJoin}} to always perform 
a collision check when not replacing and adds the ability to replace without 
bootstrap. To summarise the points from the comments above, and how they're 
addressed in the branch:
* Seed nodes will perform a shadow round, which they're unable to exit when 
all seeds are started concurrently. The patch addresses this by modifying the 
response to a shadow digest syn. If a node receiving a shadow syn is itself in 
a shadow round, rather than ignoring it, it responds with a minimal ack with 
both the digest list and state map empty.This indicates to the node sending the 
syn that the seed is in its shadow round. The syn-sending node is permitted to 
exit its shadow round if it receives a regular ack (current behaviour) or if 
all seeds are found to be in a shadow round.
* Whilst this is beneficial when all seeds are started concurrently, making 
that a mandatory requirement is, I feel, going to be overly burdensome for 
operators. In a cluster with > 1 seed, when all seeds are down the only options 
are to restart all seeds concurrently or to modify the seed list on one, 
restart that, then bring the others up before finally restoring the first 
seed's list and restarting it again. To fix this, the patch uses [~Stefania]'s 
approach and ultimately makes the shadow round a best effort. Note that this is 
for nodes which appear in their own seed list only, any node which doesn't 
consider itself a seed with fail to startup if it cannot complete a shadow 
round. I've also added a property, {{cassandra.allow_unsafe_join}} to skip the 
shadow round completely for use in testing. 
* The collision check itself needs to be extended, as it's no longer 
performed only for bootstrapping nodes. When a node is not bootstrapping, we 
need to verify that its address is not already associated with another host id. 
When the node is bootstrapping, behaviour remains the same as before. That is, 
any previous status for the endpoint is retrieved from the shadow round & 
tested against a blacklist of disallowed previous states. 
* As [~pauloricardomg] noted, a side effect is to disallow the unsafe, 
ghetto-replace approach to dealing with scenarios such as a JBOD disk failure. 
With that in mind patch decouples bootstrap and replacement, so that the 
combination of {{replace_address(_first_boot)}} and {{auto_bootstrap=false}} is 
permitted. As this is a "genuine (but unsafe)" scenario, I've added startup 
flag to ensure that operators are cognizant of the risk involved in doing so. A 
benefit of this over the documented approach of manually setting initial tokens 
in yaml is that the replacement tokens are retrieved from gossip, reducing the 
chance for operator error. Performing this non-bootstrapping replace requires 
{{-Dcassandra.allow_unsafe_replace=true}} at startup.

The best-effort-only approach to the shadow round for seed nodes clearly 
undermines the safety benefits of all this somewhat, raising the question of 
whether it's actually worth making the modifications to Syn & Ack handling. In 
my opinion, those changes are fairly minimal & don't make things significantly 
harder to reason about, so I'm ok with including those but I won't argue too 
strongly if dropping them is suggested. 

[One of the utest 
failures|http://cassci.datastax.com/job/beobal-10134-trunk-testall/2/testReport/org.apache.cassandra.service/RemoveTest/testLocalHostId_compression/]
 looks particularly suspect. My suspicion is that something is already racy 
with the test setup as I can find the exact same failure in recent-ish [another 
run|http://cassci.datastax.com/job/trunk_testall/806/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/].
 So far I've been unable to figure out exactly what the problem is by 
inspection & I haven't been able to repro the failure in > 100 test runs 
locally. 

There were also some dtest failures on the earlier runs, caused by a bug which 
I've since fixed & verified locally (CI is pending).

I've pushed a dtest branch containing [~thobbs]' original new test, plus 
another for the unsafe replace operation 
[here|https://github.com/beobal/cassandra-dtest/tree/10134]. A follow up will 
be to identify which tests are adversely affected by the mandatory shadow round 
(I know that {{ttl_test}} definitely is), then have them skip it if appropriate.

||branch||testall||dtest||

[jira] [Commented] (CASSANDRA-11514) trunk compaction performance regression

2016-04-07 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230795#comment-15230795
 ] 

Michael Shuler commented on CASSANDRA-11514:


Bumped up from 10M to 25M writes/reads and this round on 
[trunk_3.5_3.0-compaction_LCS|http://cstar.datastax.com/tests/artifacts/aa6dee6a-fcd7-11e5-8f8b-0256e416528f/graph]
 looks like 3.5 is pretty evenly split performance-wise between 3.0 and trunk.

(Waiting on DTCS to finish)

> trunk compaction performance regression
> ---
>
> Key: CASSANDRA-11514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: cstar_perf
>Reporter: Michael Shuler
>  Labels: performance
> Fix For: 3.x
>
> Attachments: trunk-compaction_dtcs-op_rate.png, 
> trunk-compaction_lcs-op_rate.png
>
>
> It appears that a commit between Mar 29-30 has resulted in a drop in 
> compaction performance. I attempted to get a log list of commits to post 
> here, but
> {noformat}
> git log trunk@{2016-03-29}..trunk@{2016-03-31}
> {noformat}
> appears to be incomplete, since reading through {{git log}} I see netty and 
> och were upgraded during this time period.
> !trunk-compaction_dtcs-op_rate.png!
> !trunk-compaction_lcs-op_rate.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11527) Improve ColumnFilter

2016-04-07 Thread Robert Stupp (JIRA)
Robert Stupp created CASSANDRA-11527:


 Summary: Improve ColumnFilter
 Key: CASSANDRA-11527
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11527
 Project: Cassandra
  Issue Type: Improvement
Reporter: Robert Stupp
Priority: Minor
 Fix For: 4.x


While working on CASSANDRA-7396, it turned out that it could be beneficial to 
modify {{ColumnFilter}} class:
* Allow multiple single element + slice filters for a single column
* At the moment we fetch all cell paths for a single column and just skip the 
values. For a subselection it feels more convenient to just return the selected 
cells and skip the filtered cell-paths.
* Remove Thrift related code.

This requires a change in the serialized format of {{ColumnFilter}} and thus 
proposed for 4.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7826) support non-frozen, nested collections

2016-04-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230770#comment-15230770
 ] 

Robert Stupp commented on CASSANDRA-7826:
-

bq. We'll use as many component in the CellPath as there is level of nestedness
+1 - also because it feels easier to implement

> support non-frozen, nested collections
> --
>
> Key: CASSANDRA-7826
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7826
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
>Reporter: Tupshin Harper
>Assignee: Alex Petrov
>  Labels: ponies
> Fix For: 3.x
>
>
> The inability to nest collections is one of the bigger data modelling 
> limitations we have right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230742#comment-15230742
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

/cc [~jrwest]

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
>  

[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230714#comment-15230714
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

Also it looks like the actual key from the index file might be read 
successfully, so maybe along side of printing token you can also print actual 
data from the key that might be helpful to figure out if it's a real key or 
just random set of 32k bytes.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Updated] (CASSANDRA-11430) Add legacy notifications backward-support on deprecated repair methods

2016-04-07 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11430:

Fix Version/s: 3.0.x
   2.2.x

> Add legacy notifications backward-support on deprecated repair methods
> --
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Paulo Motta
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230670#comment-15230670
 ] 

Chris Lohfink edited comment on CASSANDRA-11363 at 4/7/16 5:54 PM:
---

fixed by CASSANDRA-10200 ?


was (Author: cnlwsu):
caused by CASSANDRA-10200 ?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230687#comment-15230687
 ] 

Pavel Yaskevich commented on CASSANDRA-11525:
-

I think we just use incorrect serializer in some of the situations e.g. when 
{{clustering order by}} is used, that's what the problem is because it can't 
properly deserialize index entry. 

[~doanduyhai] It would be great if you could share sstables again so i can 
reproduce locally.

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230670#comment-15230670
 ] 

Chris Lohfink commented on CASSANDRA-11363:
---

caused by CASSANDRA-10200 ?

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11470) dtest failure in materialized_views_test.TestMaterializedViews.base_replica_repair_test

2016-04-07 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230643#comment-15230643
 ] 

Philip Thompson commented on CASSANDRA-11470:
-

Done so here:
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/62/console

It will run 450 times. Hopefully that's enough.

> dtest failure in 
> materialized_views_test.TestMaterializedViews.base_replica_repair_test
> ---
>
> Key: CASSANDRA-11470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Stefania
>  Labels: dtest
> Fix For: 3.x
>
> Attachments: node1.log, node2.log, node2_debug.log, node3.log, 
> node3_debug.log
>
>
> base_replica_repair_test has failed on trunk with the following exception in 
> the log of node2:
> {code}
> ERROR [main] 2016-03-31 08:48:46,949 CassandraDaemon.java:708 - Exception 
> encountered during startup
> java.lang.RuntimeException: Failed to list files in 
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985
> at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:725)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:690) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:567)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:555)
>  ~[main/:na]
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:383) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.(Keyspace.java:320) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:130) 
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:107) 
> ~[main/:na]
> at 
> org.apache.cassandra.cql3.restrictions.StatementRestrictions.(StatementRestrictions.java:139)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:864)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:811)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:799)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:505)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:242)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.prepareInternal(QueryProcessor.java:286)
>  ~[main/:na]
> at 
> org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:294)
>  ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.query(SchemaKeyspace.java:1246) 
> ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:875)
>  ~[main/:na]
> at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:867)
>  ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:134) 
> ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:124) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:229) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) 
> [main/:na]
> Caused by: java.lang.RuntimeException: Failed to list directory files in 
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985,
>  inconsistent disk state for transaction 
> [ma_txn_flush_58db56b0-f71d-11e5-bf68-03a01adb9f11.log in 
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985]
> at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:149)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:103)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister$$Lambda$48/35984028.accept(Unknown
>  Source) ~[na:na]

[jira] [Commented] (CASSANDRA-10988) ClassCastException in SelectStatement

2016-04-07 Thread Vadim TSes'ko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230612#comment-15230612
 ] 

Vadim TSes'ko commented on CASSANDRA-10988:
---

We managed to reproduce the bug using Cassandra 2.2.5 with the following table 
schema (unset parameters use default values):
{code:sql}
CREATE TABLE mytable (
u text,
t timeuuid,
PRIMARY KEY (u, t)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (t DESC)
AND compaction = {'min_threshold': '2', 'class': 
'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 
'base_time_seconds': '1'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 10
AND gc_grace_seconds = 0
AND read_repair_chance = 0.0;
{code}
And the following query:
{code:sql}
SELECT COUNT(1) as cnt
FROM mytable
WHERE u = :user AND t > :timestamp
ORDER BY t DESC
LIMIT 110;
{code}
Removing {{COMPACT STORAGE}} definitely helps, so it is somehow connected.

> ClassCastException in SelectStatement
> -
>
> Key: CASSANDRA-10988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10988
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Vassil Hristov
>
> After we've upgraded our cluster to version 2.1.11, we started getting the 
> bellow exceptions for some of our queries. Issue seems to be very similar to 
> CASSANDRA-7284.
> {code:java}
> java.lang.ClassCastException: 
> org.apache.cassandra.db.composites.Composites$EmptyComposite cannot be cast 
> to org.apache.cassandra.db.composites.CellName
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.cellFromByteBuffer(AbstractCellNameType.java:188)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractSimpleCellNameType.makeCellName(AbstractSimpleCellNameType.java:125)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType.makeCellName(AbstractCellNameType.java:254)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.makeExclusiveSliceBound(SelectStatement.java:1197)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.applySliceRestriction(SelectStatement.java:1205)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1283)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1250)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:299)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:276)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:67)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:238)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:493)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:138)
>  ~[apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.11.jar:2.1.11]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
> at 
> 

[jira] [Commented] (CASSANDRA-11505) dtest failure in cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_reading_max_parse_errors

2016-04-07 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230594#comment-15230594
 ] 

Michael Shuler commented on CASSANDRA-11505:


This test is now hanging on my local machine, as well as test boxes. I'm 
working on debugging in [CSTAR-479 
(private)|https://datastax.jira.com/browse/CSTAR-479]

> dtest failure in 
> cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_reading_max_parse_errors
> -
>
> Key: CASSANDRA-11505
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11505
> Project: Cassandra
>  Issue Type: Test
>Reporter: Michael Shuler
>Assignee: DS Test Eng
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_novnode_dtest/197/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_reading_max_parse_errors
> Failed on CassCI build cassandra-3.0_novnode_dtest #197
> {noformat}
> Error Message
> False is not true
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-c2AJlu
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Importing csv file /mnt/tmp/tmp2O43PH with 10 max parse errors
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 943, in test_reading_max_parse_errors
> self.assertTrue(num_rows_imported < (num_rows / 2))  # less than the 
> maximum number of valid rows in the csv
>   File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
> raise self.failureException(msg)
> "False is not true\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-c2AJlu\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\ndtest: DEBUG: Importing csv file /mnt/tmp/tmp2O43PH with 10 max parse 
> errors\n- >> end captured logging << 
> -"
> Standard Output
> (EE)  Using CQL driver:  '/home/automaton/cassandra/bin/../lib/cassandra-driver-internal-only-3.0.0-6af642d.zip/cassandra-driver-3.0.0-6af642d/cassandra/__init__.py'>(EE)
>   Using connect timeout: 5 seconds(EE)  Using 'utf-8' encoding(EE)  
> :2:Failed to import 2500 rows: ParseError - could not convert string 
> to float: abc,  given up without retries(EE)  :2:Exceeded maximum 
> number of parse errors 10(EE)  :2:Failed to process 2500 rows; failed 
> rows written to import_ks_testmaxparseerrors.err(EE)  :2:Exceeded 
> maximum number of parse errors 10(EE)  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11526) Make ResultSetBuilder.rowToJson public

2016-04-07 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230552#comment-15230552
 ] 

Aleksey Yeschenko commented on CASSANDRA-11526:
---

LGTM, conditional on CI passing:

||branch||testall||dtest||
|[11526-3.6|https://github.com/iamaleksey/cassandra/tree/11526-3.6]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11526-3.6-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11526-3.6-dtest]|

> Make ResultSetBuilder.rowToJson public
> --
>
> Key: CASSANDRA-11526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11526
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Berenguer Blasi
> Fix For: 3.x
>
> Attachments: CASSANDRA-11526.txt
>
>
> Make ResultSetBuilder.rowToJson public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate McCall updated CASSANDRA-11363:

Reproduced In: 3.0.3, 2.1.13, 2.1.12  (was: 2.1.12, 2.1.13)

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230546#comment-15230546
 ] 

Nate McCall commented on CASSANDRA-11363:
-

Raised this to critical. 

You have four long-time users with multiple large deployments who are seeing a 
quantifiable percentage of client errors on non-resource constrained clusters 
across multiple versions. 

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11363) Blocked NTR When Connecting Causing Excessive Load

2016-04-07 Thread Nate McCall (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate McCall updated CASSANDRA-11363:

Priority: Critical  (was: Major)

> Blocked NTR When Connecting Causing Excessive Load
> --
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Russell Bradberry
>Priority: Critical
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the 
> machine load increases to very high levels (> 120 on an 8 core machine) and 
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of 
> total requests being processed at a given time (as the latter increases with 
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and 
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a 
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 88387821 0
>  0
> ReadStage 0 0 355860 0
>  0
> RequestResponseStage  0 72532457 0
>  0
> ReadRepairStage   0 0150 0
>  0
> CounterMutationStage 32   104 897560 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 65 0
>  0
> GossipStage   0 0   2338 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor2   190474 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0 10 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0310 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   110 94 0
>  0
> MemtablePostFlush 134257 0
>  0
> MemtableReclaimMemory 0 0 94 0
>  0
> Native-Transport-Requests   128   156 38795716
> 278451
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place, 
> the load on the machine went back to healthy, and when the flight recording 
> finished the load went back to > 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11518) o.a.c.utils.UUIDGen clock generation is not very high in entropy

2016-04-07 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-11518:
---
Status: Patch Available  (was: Open)

> o.a.c.utils.UUIDGen clock generation is not very high in entropy
> 
>
> Key: CASSANDRA-11518
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11518
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Trivial
> Fix For: 3.0.x, 3.x
>
>
> makeClockSeqAndNode uses {{java.util.Random}} to generate the clock. 
> {{Random}} only has 48-bits of internal state so it's not going to generate 
> the best bits for clock and in addition to that it uses a collision prone 
> seed that sort of defeats the purpose of clock sequence.
> A better approach to get the most out of those 14-bits would be to use 
> {{SecureRandom}} with something like SHA1PRNG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11517) o.a.c.utils.UUIDGen could handle contention better

2016-04-07 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-11517:
---
Status: Patch Available  (was: Open)

> o.a.c.utils.UUIDGen could handle contention better
> --
>
> Key: CASSANDRA-11517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11517
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.x
>
>
> I noticed this profiling a query handler implementation that uses UUIDGen to 
> get handles to track queries for logging purposes.
> Under contention threads are being unscheduled instead of spinning until the 
> lock is available. I would have expected intrinsic locks to be able to adapt 
> to this based on profiling information.
> Either way it's seems pretty straightforward to rewrite this to use a CAS 
> loop and test that it generally produces unique values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11517) o.a.c.utils.UUIDGen could handle contention better

2016-04-07 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-11517:
---
Fix Version/s: (was: 3.0.x)

> o.a.c.utils.UUIDGen could handle contention better
> --
>
> Key: CASSANDRA-11517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11517
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.x
>
>
> I noticed this profiling a query handler implementation that uses UUIDGen to 
> get handles to track queries for logging purposes.
> Under contention threads are being unscheduled instead of spinning until the 
> lock is available. I would have expected intrinsic locks to be able to adapt 
> to this based on profiling information.
> Either way it's seems pretty straightforward to rewrite this to use a CAS 
> loop and test that it generally produces unique values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11517) o.a.c.utils.UUIDGen could handle contention better

2016-04-07 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228936#comment-15228936
 ] 

Ariel Weisberg edited comment on CASSANDRA-11517 at 4/7/16 4:44 PM:


|[trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-11517-trunk?expand=1]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-trunk-testall/3/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-trunk-dtest/3/]|

Not proof of any real performance benefit in context, but the unit test runs in 
250 milliseconds with the CAS loop and 1.4 seconds without the CAS loop.


was (Author: aweisberg):
|[trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-11517-trunk?expand=1]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-trunk-testall/3/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-trunk-dtest/3/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-11517-3.0?expand=1]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-3.0-testall/3/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-11517-3.0-dtest/3/]|

Not proof of any real performance benefit in context, but the unit test runs in 
250 milliseconds with the CAS loop and 1.4 seconds without the CAS loop.

> o.a.c.utils.UUIDGen could handle contention better
> --
>
> Key: CASSANDRA-11517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11517
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
>
> I noticed this profiling a query handler implementation that uses UUIDGen to 
> get handles to track queries for logging purposes.
> Under contention threads are being unscheduled instead of spinning until the 
> lock is available. I would have expected intrinsic locks to be able to adapt 
> to this based on profiling information.
> Either way it's seems pretty straightforward to rewrite this to use a CAS 
> loop and test that it generally produces unique values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11502) Fix denseness and column metadata updates coming from Thrift

2016-04-07 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11502:
--
 Reviewer: Sylvain Lebresne
Reproduced In: 2.2.5, 2.1.13  (was: 2.1.13, 2.2.5)
   Status: Patch Available  (was: Open)

> Fix denseness and column metadata updates coming from Thrift
> 
>
> Key: CASSANDRA-11502
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11502
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> It was 
> [decided|https://issues.apache.org/jira/browse/CASSANDRA-7744?focusedCommentId=14095472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14095472]
>  that we'd be recalculating {{is_dense}} for table updates coming from Thrift 
> on every change. However, due to some oversight, {{is_dense}} can only go 
> from {{false}} to {{true}}. Once dense, even adding a {{REGULAR}} column will 
> not reset {{is_dense}} back to {{false}}.
> The recalculation fails because no matter what happens, we never remove the 
> auto-generated {{CLUSTERING}} and {{COMPACT_VALUE}} columns of a dense table.
> Which ultimately leads to the issue on 2.2 to 3.0 upgrade (see 
> CASSANDRA-11315).
> What we should do is remove the special-case for Thrift in 
> {{LegacySchemaTables::makeUpdateTableMutation}} and correct the logic in 
> {{ThriftConversion::internalFromThrift}} to remove those columns when going 
> from dense to sparse.
> This is not enough to fix CASSANDRA-11315, however, as we need to handle 
> pre-patch upgrades, and upgrades from 2.1. Fixing it in 2.2 means a) getting 
> proper schema from {{DESCRIBE}} now and b) using the more efficient 
> {{SparseCellNameType}} when you add columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11502) Fix denseness and column metadata updates coming from Thrift

2016-04-07 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230525#comment-15230525
 ] 

Aleksey Yeschenko commented on CASSANDRA-11502:
---

||branch||testall||dtest||
|[11502-2.2|https://github.com/iamaleksey/cassandra/tree/11502-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11502-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11502-2.2-dtest]|
|[11502-3.0|https://github.com/iamaleksey/cassandra/tree/11502-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11502-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/iamaleksey/job/iamaleksey-11502-3.0-dtest]|

2.2 version makes sure we recalculate and remove redundant column. 3.0 versions 
preserves denseness, but cleans up unnecessary Thrift special casing from 
{{SchemaKeyspace::makeUpdateTableMutation}}. It's unnecessary because despite 
Thrift not knowing about non regular/static columns, 
{{ThriftConversion::internalFromThrift}} will include them from the current 
CFM, so we are not going to lose anything.

> Fix denseness and column metadata updates coming from Thrift
> 
>
> Key: CASSANDRA-11502
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11502
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> It was 
> [decided|https://issues.apache.org/jira/browse/CASSANDRA-7744?focusedCommentId=14095472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14095472]
>  that we'd be recalculating {{is_dense}} for table updates coming from Thrift 
> on every change. However, due to some oversight, {{is_dense}} can only go 
> from {{false}} to {{true}}. Once dense, even adding a {{REGULAR}} column will 
> not reset {{is_dense}} back to {{false}}.
> The recalculation fails because no matter what happens, we never remove the 
> auto-generated {{CLUSTERING}} and {{COMPACT_VALUE}} columns of a dense table.
> Which ultimately leads to the issue on 2.2 to 3.0 upgrade (see 
> CASSANDRA-11315).
> What we should do is remove the special-case for Thrift in 
> {{LegacySchemaTables::makeUpdateTableMutation}} and correct the logic in 
> {{ThriftConversion::internalFromThrift}} to remove those columns when going 
> from dense to sparse.
> This is not enough to fix CASSANDRA-11315, however, as we need to handle 
> pre-patch upgrades, and upgrades from 2.1. Fixing it in 2.2 means a) getting 
> proper schema from {{DESCRIBE}} now and b) using the more efficient 
> {{SparseCellNameType}} when you add columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11503) Need a tool to detect what percentage of SSTables on a node have been repaired when using incremental repairs.

2016-04-07 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-11503:
--
Status: Patch Available  (was: Open)

> Need a tool to detect what percentage of SSTables on a node have been 
> repaired when using incremental repairs.
> --
>
> Key: CASSANDRA-11503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11503
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Sean Usher
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11503.patch
>
>
> When using incremental repair, we should be able to look at SSTables and 
> understand how many sstables are in the repaired and unrepaired buckets on 
> each machine. This can help us track the repair progress and if we are 
> hitting any issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11503) Need a tool to detect what percentage of SSTables on a node have been repaired when using incremental repairs.

2016-04-07 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-11503:
--
Attachment: CASSANDRA-11503.patch

> Need a tool to detect what percentage of SSTables on a node have been 
> repaired when using incremental repairs.
> --
>
> Key: CASSANDRA-11503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11503
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Sean Usher
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11503.patch
>
>
> When using incremental repair, we should be able to look at SSTables and 
> understand how many sstables are in the repaired and unrepaired buckets on 
> each machine. This can help us track the repair progress and if we are 
> hitting any issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11526) Make ResultSetBuilder.rowToJson public

2016-04-07 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11526:
--
Reviewer: Aleksey Yeschenko

> Make ResultSetBuilder.rowToJson public
> --
>
> Key: CASSANDRA-11526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11526
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Berenguer Blasi
> Fix For: 3.x
>
> Attachments: CASSANDRA-11526.txt
>
>
> Make ResultSetBuilder.rowToJson public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11526) Make ResultSetBuilder.rowToJson public

2016-04-07 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11526:

Status: Patch Available  (was: Open)

> Make ResultSetBuilder.rowToJson public
> --
>
> Key: CASSANDRA-11526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11526
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Berenguer Blasi
> Fix For: 3.x
>
> Attachments: CASSANDRA-11526.txt
>
>
> Make ResultSetBuilder.rowToJson public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11526) Make ResultSetBuilder.rowToJson public

2016-04-07 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11526:

Attachment: CASSANDRA-11526.txt

> Make ResultSetBuilder.rowToJson public
> --
>
> Key: CASSANDRA-11526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11526
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Berenguer Blasi
> Fix For: 3.x
>
> Attachments: CASSANDRA-11526.txt
>
>
> Make ResultSetBuilder.rowToJson public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11526) Make ResultSetBuilder.rowToJson public

2016-04-07 Thread Jeremiah Jordan (JIRA)
Jeremiah Jordan created CASSANDRA-11526:
---

 Summary: Make ResultSetBuilder.rowToJson public
 Key: CASSANDRA-11526
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11526
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jeremiah Jordan
Assignee: Berenguer Blasi
 Fix For: 3.x


Make ResultSetBuilder.rowToJson public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11430) Add legacy notifications backward-support on deprecated repair methods

2016-04-07 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230454#comment-15230454
 ] 

Nick Bailey commented on CASSANDRA-11430:
-

Yeah a quick test with nodetool from 2.1 appears to be working with this patch.

> Add legacy notifications backward-support on deprecated repair methods
> --
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Paulo Motta
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11525) SASI index corruption

2016-04-07 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230453#comment-15230453
 ] 

DOAN DuyHai commented on CASSANDRA-11525:
-

If necessary I can upload the base SSTables and corresponding SASI index files

> SASI index corruption
> -
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
>Reporter: DOAN DuyHai
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench ( 
>  dsr_id uuid,
>  rel_seq bigint,
>  seq bigint,
>  dsp_code varchar,
>  model_code varchar,
>  media_code varchar,
>  transfer_code varchar,
>  commercial_offer_code varchar,
>  territory_code varchar,
>  period_end_month_int int,
>  authorized_societies_txt text,
>  rel_type text,
>  status text,
>  dsp_release_code text,
>  title text,
>  contributors_name list,
>  unic_work text,
>  paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC); 
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench 
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH 
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM 
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side 
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN  [SharedPool-Worker-1] 2016-04-06 00:00:30,825 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
>   at 
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
>   at 
> 

  1   2   >