[jira] [Updated] (CASSANDRA-15989) Provide easy copypasta config formatting for nodetool get commands

2020-07-27 Thread Jonathan Shook (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-15989:
---
Description: 
Allow all nodetool commands which print out the state of the node or cluster to 
do so in a way that makes it easy to re-use or paste on other nodes or config 
files.

For example, the command getcompactionthroughput formats its output like this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput  
Current compaction throughput: 64 MB/s
{noformat}
But with an --as-yaml option, it could do this instead:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
compaction_throughput_mb_per_sec: 64{noformat}
and with an --as-cli option, it could do this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
./nodetool setcompactionthroughput 64{noformat}
Any other nodetool standard options should simply be carried along to the 
--as-cli form, with the exception of -pw.

Any -pw options should be elided with a warning in comments, but -pwf options 
should be allowed. This would allow users using -pw to append a password at 
their discretion, but would allow -pwf to work as usual.

In the absence of either of the options above (--as-yaml or --as-cli) the 
formatting should not be changed to avoid breaking extant tool integrations.

 

  was:
Allow all nodetool commands which print out the state of the node or cluster to 
do so in a way that makes it easy to re-use or paste on other nodes or config 
files.

For example, the command getcompactionthroughput formats its output like this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput  
Current compaction throughput: 64 MB/s
{noformat}
But with an --as-yaml option, it could do this instead:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
compaction_throughput_mb_per_sec: 64{noformat}
and with an --as-cli option, it could do this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
./nodetool setcompactionthroughput 64{noformat}
Any other nodetool standard options should simply be carried along to the 
--as-cli form, with the exception of -pw.

Any -pw options should be elided with a warning in comments, but -pwf options 
should be allowed. This would allow users using -pw to append a password at 
their discretion, but would allow -pwf to work as usual.

 


> Provide easy copypasta config formatting for nodetool get commands
> --
>
> Key: CASSANDRA-15989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15989
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Shook
>Priority: Normal
>
> Allow all nodetool commands which print out the state of the node or cluster 
> to do so in a way that makes it easy to re-use or paste on other nodes or 
> config files.
> For example, the command getcompactionthroughput formats its output like this:
> {noformat}
> [jshook@cass4 bin]$ ./nodetool getcompactionthroughput  
> Current compaction throughput: 64 MB/s
> {noformat}
> But with an --as-yaml option, it could do this instead:
> {noformat}
> [jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
> compaction_throughput_mb_per_sec: 64{noformat}
> and with an --as-cli option, it could do this:
> {noformat}
> [jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
> ./nodetool setcompactionthroughput 64{noformat}
> Any other nodetool standard options should simply be carried along to the 
> --as-cli form, with the exception of -pw.
> Any -pw options should be elided with a warning in comments, but -pwf options 
> should be allowed. This would allow users using -pw to append a password at 
> their discretion, but would allow -pwf to work as usual.
> In the absence of either of the options above (--as-yaml or --as-cli) the 
> formatting should not be changed to avoid breaking extant tool integrations.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15989) Provide easy copypasta config formatting for nodetool get commands

2020-07-27 Thread Jonathan Shook (Jira)
Jonathan Shook created CASSANDRA-15989:
--

 Summary: Provide easy copypasta config formatting for nodetool get 
commands
 Key: CASSANDRA-15989
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15989
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Shook


Allow all nodetool commands which print out the state of the node or cluster to 
do so in a way that makes it easy to re-use or paste on other nodes or config 
files.

For example, the command getcompactionthroughput formats its output like this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput  
Current compaction throughput: 64 MB/s
{noformat}
But with an --as-yaml option, it could do this instead:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
compaction_throughput_mb_per_sec: 64{noformat}
and with an --as-cli option, it could do this:
{noformat}
[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
./nodetool setcompactionthroughput 64{noformat}
Any other nodetool standard options should simply be carried along to the 
--as-cli form, with the exception of -pw.

Any -pw options should be elided with a warning in comments, but -pwf options 
should be allowed. This would allow users using -pw to append a password at 
their discretion, but would allow -pwf to work as usual.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15988) Add nodetool getfullquerylog

2020-07-27 Thread Jonathan Shook (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165937#comment-17165937
 ] 

Jonathan Shook commented on CASSANDRA-15988:


My take on this:

I think that it is reasonable to show the user whether FQL is enabled or not as 
the first item.
Additionally, the configuration of FQL should be dumped to stdout in the same 
formatting convention of other nodetool get... commands.

In terms of whether it goes into 4.0 or 4.1, I think it is obviously missing 
functionality. Not being able to query the state of the service without 
changing it is a problem. Consider a scenario where multiple users are managing 
a system together and need to double check the state of things before 
proceeding to the next step in their process. As manual as this sounds, many 
teams still do this type of ops work and need visibility to the operational 
state of the system.

 

> Add nodetool getfullquerylog 
> -
>
> Key: CASSANDRA-15988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15988
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ekaterina Dimitrova
>Priority: Normal
>
> This ticket is raised based on CASSANDRA-15791 and valuable feedback provided 
> by [~jshook].
> There are two outstanding questions:
>  * forming the exact shape of such a command and how it can benefit the 
> users; to be discussed in detail in this ticket
>  * Is this a thing we as a project can add to 4.0 beta or it should be 
> considered in 4.1 for example
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15971) full query log needs improvement

2020-07-23 Thread Jonathan Shook (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163905#comment-17163905
 ] 

Jonathan Shook commented on CASSANDRA-15971:


I was able to get the server to log FQL data with both Java 8 and Java 11. This 
appears to be a docs issue, as I had read that you could configure fql logging 
either in yaml or via nodetool. The official docs seem correct, so no issue 
there.

However, the other usability issues still apply in my view, and we might want 
to triage them separately.

> full query log needs improvement
> 
>
> Key: CASSANDRA-15971
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15971
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/fql
>Reporter: Jonathan Shook
>Priority: Normal
> Attachments: st1.txt
>
>
> When trying out full query logging as a possible integration for nosqlbench 
> usage, I ran across many issues which would make it painful for users. Since 
> there were several, they will be added to a single issue for now. This issue 
> can be broken up if needed.
> 
> FQL doesn't work on my system, even though it says it is logging queries. 
> With the following configuration in cassandra.yaml:
>  
> {noformat}
> full_query_logging_options:
>     log_dir: /REDACTED/fullquerylogs
>     roll_cycle: HOURLY
>     block: true
>     max_queue_weight: 268435456 # 256 MiB
>     max_log_size: 17179869184 # 16 GiB
>     ## archive command is "/path/to/script.sh %path" where %path is replaced 
> with the file being rolled:
>     # archive_command:
>     # max_archive_retries: 10
> {noformat}
> which appears to be the minimal configuration needed to enable fql, only a 
> single file `directory-listing.cq4t` is created, which is a 64K sized file of 
> zeroes.
>  
> 
> Calling bin/nodetool enablefullquerylog throws an error initially.
> [jshook@cass4 bin]$ ./nodetool enablefullquerylog
>  
> {noformat}
> error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
> -- StackTrace -- 
> java.lang.NoSuchMethodException: 
> sun.nio.ch.FileChannelImpl.map0(int,long,long) 
>     at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
>     at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
>     at 
> net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
> (full stack trace attached to this ticket)
>  
> Subsequent calls produce normal output:
>  
> {noformat}
> [jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
> nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
> See 'nodetool help' or 'nodetool help '.{noformat}
>  
> 
> nodetool missing getfullquerylog makes it difficult to verify current 
> fullquerylog state without changing it. The conventions for nodetool commands 
> should be followed to avoid confusing users.
> 
> (maybe)
> {noformat}
> tools/bin/fqltool help{noformat}
> should print out help for all fqltool commands rather than simply repeating 
> the default The most commonly used fqltool commands are..
> 
> [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
> malformatted, mixing the appearance of configuration with comments, which is 
> confusing at best.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15971) full query log needs improvement

2020-07-23 Thread Jonathan Shook (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163718#comment-17163718
 ] 

Jonathan Shook commented on CASSANDRA-15971:


My issue is with the actual fql logging on the server not writing logs. I'll 
try to look into it today.

> full query log needs improvement
> 
>
> Key: CASSANDRA-15971
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15971
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/fql
>Reporter: Jonathan Shook
>Priority: Normal
> Attachments: st1.txt
>
>
> When trying out full query logging as a possible integration for nosqlbench 
> usage, I ran across many issues which would make it painful for users. Since 
> there were several, they will be added to a single issue for now. This issue 
> can be broken up if needed.
> 
> FQL doesn't work on my system, even though it says it is logging queries. 
> With the following configuration in cassandra.yaml:
>  
> {noformat}
> full_query_logging_options:
>     log_dir: /REDACTED/fullquerylogs
>     roll_cycle: HOURLY
>     block: true
>     max_queue_weight: 268435456 # 256 MiB
>     max_log_size: 17179869184 # 16 GiB
>     ## archive command is "/path/to/script.sh %path" where %path is replaced 
> with the file being rolled:
>     # archive_command:
>     # max_archive_retries: 10
> {noformat}
> which appears to be the minimal configuration needed to enable fql, only a 
> single file `directory-listing.cq4t` is created, which is a 64K sized file of 
> zeroes.
>  
> 
> Calling bin/nodetool enablefullquerylog throws an error initially.
> [jshook@cass4 bin]$ ./nodetool enablefullquerylog
>  
> {noformat}
> error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
> -- StackTrace -- 
> java.lang.NoSuchMethodException: 
> sun.nio.ch.FileChannelImpl.map0(int,long,long) 
>     at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
>     at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
>     at 
> net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
> (full stack trace attached to this ticket)
>  
> Subsequent calls produce normal output:
>  
> {noformat}
> [jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
> nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
> See 'nodetool help' or 'nodetool help '.{noformat}
>  
> 
> nodetool missing getfullquerylog makes it difficult to verify current 
> fullquerylog state without changing it. The conventions for nodetool commands 
> should be followed to avoid confusing users.
> 
> (maybe)
> {noformat}
> tools/bin/fqltool help{noformat}
> should print out help for all fqltool commands rather than simply repeating 
> the default The most commonly used fqltool commands are..
> 
> [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
> malformatted, mixing the appearance of configuration with comments, which is 
> confusing at best.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15971) full query log needs improvement

2020-07-22 Thread Jonathan Shook (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163072#comment-17163072
 ] 

Jonathan Shook commented on CASSANDRA-15971:


This was with Java 11.

> full query log needs improvement
> 
>
> Key: CASSANDRA-15971
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15971
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/fql
>Reporter: Jonathan Shook
>Priority: Normal
> Attachments: st1.txt
>
>
> When trying out full query logging as a possible integration for nosqlbench 
> usage, I ran across many issues which would make it painful for users. Since 
> there were several, they will be added to a single issue for now. This issue 
> can be broken up if needed.
> 
> FQL doesn't work on my system, even though it says it is logging queries. 
> With the following configuration in cassandra.yaml:
>  
> {noformat}
> full_query_logging_options:
>     log_dir: /REDACTED/fullquerylogs
>     roll_cycle: HOURLY
>     block: true
>     max_queue_weight: 268435456 # 256 MiB
>     max_log_size: 17179869184 # 16 GiB
>     ## archive command is "/path/to/script.sh %path" where %path is replaced 
> with the file being rolled:
>     # archive_command:
>     # max_archive_retries: 10
> {noformat}
> which appears to be the minimal configuration needed to enable fql, only a 
> single file `directory-listing.cq4t` is created, which is a 64K sized file of 
> zeroes.
>  
> 
> Calling bin/nodetool enablefullquerylog throws an error initially.
> [jshook@cass4 bin]$ ./nodetool enablefullquerylog
>  
> {noformat}
> error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
> -- StackTrace -- 
> java.lang.NoSuchMethodException: 
> sun.nio.ch.FileChannelImpl.map0(int,long,long) 
>     at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
>     at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
>     at 
> net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
> (full stack trace attached to this ticket)
>  
> Subsequent calls produce normal output:
>  
> {noformat}
> [jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
> nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
> See 'nodetool help' or 'nodetool help '.{noformat}
>  
> 
> nodetool missing getfullquerylog makes it difficult to verify current 
> fullquerylog state without changing it. The conventions for nodetool commands 
> should be followed to avoid confusing users.
> 
> (maybe)
> {noformat}
> tools/bin/fqltool help{noformat}
> should print out help for all fqltool commands rather than simply repeating 
> the default The most commonly used fqltool commands are..
> 
> [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
> malformatted, mixing the appearance of configuration with comments, which is 
> confusing at best.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15971) full query log needs improvement

2020-07-22 Thread Jonathan Shook (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-15971:
---
Description: 
When trying out full query logging as a possible integration for nosqlbench 
usage, I ran across many issues which would make it painful for users. Since 
there were several, they will be added to a single issue for now. This issue 
can be broken up if needed.

FQL doesn't work on my system, even though it says it is logging queries. 

With the following configuration in cassandra.yaml:

 
{noformat}
full_query_logging_options:
    log_dir: /REDACTED/fullquerylogs
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
    # archive_command:
    # max_archive_retries: 10
{noformat}
which appears to be the minimal configuration needed to enable fql, only a 
single file `directory-listing.cq4t` is created, which is a 64K sized file of 
zeroes.

 

Calling bin/nodetool enablefullquerylog throws an error initially.

[jshook@cass4 bin]$ ./nodetool enablefullquerylog

 
{noformat}
error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
-- StackTrace -- 
java.lang.NoSuchMethodException: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
    at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
    at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
    at 
net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
(full stack trace attached to this ticket)

 

Subsequent calls produce normal output:

 
{noformat}
[jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
See 'nodetool help' or 'nodetool help '.{noformat}
 

nodetool missing getfullquerylog makes it difficult to verify current 
fullquerylog state without changing it. The conventions for nodetool commands 
should be followed to avoid confusing users.

(maybe)
{noformat}
tools/bin/fqltool help{noformat}
should print out help for all fqltool commands rather than simply repeating the 
default The most commonly used fqltool commands are..

1. [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
malformatted, mixing the appearance of configuration with comments, which is 
confusing at best.

 

  was:
When trying out full query logging as a possible integration for nosqlbench 
usage, I ran across many issues which would make it painful for users. Since 
there were several, they will be added to a single issue for now. This issue 
can be broken up if needed.

FQL doesn't work on my system, even though it says it is logging queries. 

With the following configuration in cassandra.yaml:

 
{noformat}
full_query_logging_options:
    log_dir: /REDACTED/fullquerylogs
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
    # archive_command:
    # max_archive_retries: 10
{noformat}
which appears to be the minimal configuration needed to enable fql, only a 
single file `directory-listing.cq4t` is created, which is a 64K sized file of 
zeroes.

 

Calling bin/nodetool enablefullquerylog throws an error initially.

[jshook@cass4 bin]$ ./nodetool enablefullquerylog 

 
{noformat}
error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
-- StackTrace -- 
java.lang.NoSuchMethodException: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
    at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
    at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
    at 
net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}

 (full stack trace attached to this ticket)

 

Subsequent calls produce normal output:

 
{noformat}
[jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
See 'nodetool help' or 'nodetool help '.{noformat}
 

nodetool missing getfullquerylogging makes it difficult to verify current 
fullquerylog state without changing it. The conventions for nodetool commands 
should be followed to avoid confusing users.

(maybe)
{noformat}
tools/bin/fqltool help{noformat}
should print out help for all fqltool commands rather than simply repeating the 
default The most commonly used fqltool commands are..

1. [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
malformatted, mixing the appearance of configuration with comments, which is 
confusing at best.

 


> full query log needs improvement
> 
>
> Key: CASSANDRA-15971
> URL: 

[jira] [Updated] (CASSANDRA-15971) full query log needs improvement

2020-07-22 Thread Jonathan Shook (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-15971:
---
Description: 
When trying out full query logging as a possible integration for nosqlbench 
usage, I ran across many issues which would make it painful for users. Since 
there were several, they will be added to a single issue for now. This issue 
can be broken up if needed.

FQL doesn't work on my system, even though it says it is logging queries. 

With the following configuration in cassandra.yaml:

 
{noformat}
full_query_logging_options:
    log_dir: /REDACTED/fullquerylogs
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
    # archive_command:
    # max_archive_retries: 10
{noformat}
which appears to be the minimal configuration needed to enable fql, only a 
single file `directory-listing.cq4t` is created, which is a 64K sized file of 
zeroes.

 

Calling bin/nodetool enablefullquerylog throws an error initially.

[jshook@cass4 bin]$ ./nodetool enablefullquerylog

 
{noformat}
error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
-- StackTrace -- 
java.lang.NoSuchMethodException: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
    at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
    at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
    at 
net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
(full stack trace attached to this ticket)

 

Subsequent calls produce normal output:

 
{noformat}
[jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
See 'nodetool help' or 'nodetool help '.{noformat}
 

nodetool missing getfullquerylog makes it difficult to verify current 
fullquerylog state without changing it. The conventions for nodetool commands 
should be followed to avoid confusing users.

(maybe)
{noformat}
tools/bin/fqltool help{noformat}
should print out help for all fqltool commands rather than simply repeating the 
default The most commonly used fqltool commands are..

[https://cassandra.apache.org/doc/latest/new/fqllogging.html] is malformatted, 
mixing the appearance of configuration with comments, which is confusing at 
best.

 

  was:
When trying out full query logging as a possible integration for nosqlbench 
usage, I ran across many issues which would make it painful for users. Since 
there were several, they will be added to a single issue for now. This issue 
can be broken up if needed.

FQL doesn't work on my system, even though it says it is logging queries. 

With the following configuration in cassandra.yaml:

 
{noformat}
full_query_logging_options:
    log_dir: /REDACTED/fullquerylogs
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
    # archive_command:
    # max_archive_retries: 10
{noformat}
which appears to be the minimal configuration needed to enable fql, only a 
single file `directory-listing.cq4t` is created, which is a 64K sized file of 
zeroes.

 

Calling bin/nodetool enablefullquerylog throws an error initially.

[jshook@cass4 bin]$ ./nodetool enablefullquerylog

 
{noformat}
error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
-- StackTrace -- 
java.lang.NoSuchMethodException: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
    at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
    at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
    at 
net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}
(full stack trace attached to this ticket)

 

Subsequent calls produce normal output:

 
{noformat}
[jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
See 'nodetool help' or 'nodetool help '.{noformat}
 

nodetool missing getfullquerylog makes it difficult to verify current 
fullquerylog state without changing it. The conventions for nodetool commands 
should be followed to avoid confusing users.

(maybe)
{noformat}
tools/bin/fqltool help{noformat}
should print out help for all fqltool commands rather than simply repeating the 
default The most commonly used fqltool commands are..

1. [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
malformatted, mixing the appearance of configuration with comments, which is 
confusing at best.

 


> full query log needs improvement
> 
>
> Key: CASSANDRA-15971
> URL: 

[jira] [Created] (CASSANDRA-15971) full query log needs improvement

2020-07-22 Thread Jonathan Shook (Jira)
Jonathan Shook created CASSANDRA-15971:
--

 Summary: full query log needs improvement
 Key: CASSANDRA-15971
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15971
 Project: Cassandra
  Issue Type: Improvement
  Components: Tool/fql
Reporter: Jonathan Shook
 Attachments: st1.txt

When trying out full query logging as a possible integration for nosqlbench 
usage, I ran across many issues which would make it painful for users. Since 
there were several, they will be added to a single issue for now. This issue 
can be broken up if needed.

FQL doesn't work on my system, even though it says it is logging queries. 

With the following configuration in cassandra.yaml:

 
{noformat}
full_query_logging_options:
    log_dir: /REDACTED/fullquerylogs
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
    # archive_command:
    # max_archive_retries: 10
{noformat}
which appears to be the minimal configuration needed to enable fql, only a 
single file `directory-listing.cq4t` is created, which is a 64K sized file of 
zeroes.

 

Calling bin/nodetool enablefullquerylog throws an error initially.

[jshook@cass4 bin]$ ./nodetool enablefullquerylog 

 
{noformat}
error: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
-- StackTrace -- 
java.lang.NoSuchMethodException: sun.nio.ch.FileChannelImpl.map0(int,long,long) 
    at java.base/java.lang.Class.getDeclaredMethod(Class.java:2553) 
    at net.openhft.chronicle.core.OS.lambda$static$0(OS.java:51) 
    at 
net.openhft.chronicle.core.ClassLocal.computeValue(ClassLocal.java:53){noformat}

 (full stack trace attached to this ticket)

 

Subsequent calls produce normal output:

 
{noformat}
[jshook@cass4 c4b1]$ bin/nodetool enablefullquerylog 
nodetool: Already logging to /home/jshook/c4b1/data/fullquerylogs 
See 'nodetool help' or 'nodetool help '.{noformat}
 

nodetool missing getfullquerylogging makes it difficult to verify current 
fullquerylog state without changing it. The conventions for nodetool commands 
should be followed to avoid confusing users.

(maybe)
{noformat}
tools/bin/fqltool help{noformat}
should print out help for all fqltool commands rather than simply repeating the 
default The most commonly used fqltool commands are..

1. [https://cassandra.apache.org/doc/latest/new/fqllogging.html] is 
malformatted, mixing the appearance of configuration with comments, which is 
confusing at best.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12268) Make MV Index creation robust for wide referent rows

2016-07-21 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-12268:
---
Description: 
When creating an index for a materialized view for extant data, heap pressure 
is very dependent on the cardinality of of rows associated with each index 
value. With the way that per-index value rows are created within the index, 
this can cause unbounded heap pressure, which can cause OOM. This appears to be 
a side-effect of how each index row is applied atomically as with batches.

The commit logs can accumulate enough during the process to prevent the node 
from being restarted. Given that this occurs during global index creation, this 
can happen on multiple nodes, making stable recovery of a node set difficult, 
as co-replicas become unavailable to assist in back-filling data from 
commitlogs.

While it is understandable that you want to avoid having relatively wide rows  
even in materialized views, this represents a particularly difficult scenario 
for triage.

The basic recommendation for improving this is to sub-group the index creation 
into smaller chunks internally, providing a maximal bound against the heap 
pressure when it is needed.

  was:
When creating an index for a materialized view for extant data, heap pressure 
is very dependent on the cardinality of of rows associated with each index 
value. With the way that per-index value rows are created within the index, 
this can cause unbounded heap pressure, which can cause OOM. This appears to be 
a side-effect of how each index row is applied atomically as with batches.

The commit logs can accumulate enough during the process to prevent the node 
from being restarted. Given that this occurs during global index creation, this 
can happen on multiple nodes, making stable recovery of a node set difficult, 
as co-replicas become unavailable to assist in back-filling data from 
commitlogs.

While it is understandable that you want to avoid having relatively wide rows  
even in materialized views, this scenario represent a particularly difficult 
scenario for triage.

The basic recommendation for improving this is to sub-group the index creation 
into smaller chunks internally, providing a maximal bound against the heap 
pressure when it is needed.


> Make MV Index creation robust for wide referent rows
> 
>
> Key: CASSANDRA-12268
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12268
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Shook
>
> When creating an index for a materialized view for extant data, heap pressure 
> is very dependent on the cardinality of of rows associated with each index 
> value. With the way that per-index value rows are created within the index, 
> this can cause unbounded heap pressure, which can cause OOM. This appears to 
> be a side-effect of how each index row is applied atomically as with batches.
> The commit logs can accumulate enough during the process to prevent the node 
> from being restarted. Given that this occurs during global index creation, 
> this can happen on multiple nodes, making stable recovery of a node set 
> difficult, as co-replicas become unavailable to assist in back-filling data 
> from commitlogs.
> While it is understandable that you want to avoid having relatively wide rows 
>  even in materialized views, this represents a particularly difficult 
> scenario for triage.
> The basic recommendation for improving this is to sub-group the index 
> creation into smaller chunks internally, providing a maximal bound against 
> the heap pressure when it is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12268) Make MV Index creation robust for wide referent rows

2016-07-21 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-12268:
--

 Summary: Make MV Index creation robust for wide referent rows
 Key: CASSANDRA-12268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12268
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Shook


When creating an index for a materialized view for extant data, heap pressure 
is very dependent on the cardinality of of rows associated with each index 
value. With the way that per-index value rows are created within the index, 
this can cause unbounded heap pressure, which can cause OOM. This appears to be 
a side-effect of how each index row is applied atomically as with batches.

The commit logs can accumulate enough during the process to prevent the node 
from being restarted. Given that this occurs during global index creation, this 
can happen on multiple nodes, making stable recovery of a node set difficult, 
as co-replicas become unavailable to assist in back-filling data from 
commitlogs.

While it is understandable that you want to avoid having relatively wide rows  
even in materialized views, this scenario represent a particularly difficult 
scenario for triage.

The basic recommendation for improving this is to sub-group the index creation 
into smaller chunks internally, providing a maximal bound against the heap 
pressure when it is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11753) cqlsh show sessions truncates time_elapsed values > 999999

2016-05-11 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-11753:
--

 Summary: cqlsh show sessions truncates time_elapsed values > 99
 Key: CASSANDRA-11753
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11753
 Project: Cassandra
  Issue Type: Bug
  Components: CQL, Observability, Testing, Tools
Reporter: Jonathan Shook


Output from show session in cqlsh:
{quote}
Submit hint for /10.255.227.20 [EXPIRING-MAP-REAPER:1] | 2016-05-11 
15:57:53.73 | 10.255.226.163 | 283246
{quote}
Output from select * from trace_events where session_id=(same as above):
{quote}
 1bbce5c0-1791-11e6-9598-3b9ec975a2e6 | 1ee37a20-1791-11e6-9598-3b9ec975a2e6 |  
   Submit hint for /10.255.227.20 | 10.255.226.163 |
5283246 | EXPIRING-MAP-REAPER:1
{quote}
Notice that the 5 (seconds) part is being truncated in the output.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11688) Replace_address should sanity check prior node state before migrating tokens

2016-04-29 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-11688:
---
Description: 
During a node replacement, a replace_address was used which was associated with 
a different node than the intended one. The result was that both nodes remained 
active after the node came up. This caused several other issues which were 
difficult to diagnose, including invalid gossip state, etc.

Replace_address should be more robust in this scenario. It would be much more 
user friendly if the replace_address logic would first do some basic sanity 
checks, possibly to include:

- Pinging the other node to see if it is indeed “down”, if the address is 
different than all local interface addresses
- Checking gossip state of the node to verify that it is not known to peers.

It may even be safest to require that both address reachability and gossip 
state are required to show the replace_address as down by default before 
allowing any token migration or other replace_address actions to occur.

In the case that the replace_address is not ready to be replaced, the log 
should indicate that you are trying to replace an active node, and cassandra 
should refuse to start.

  was:
During a node replacement, a customer used an ip address associated with a 
different node than the intended one. The result was that both nodes remained 
active after the node came up. This caused several other issues which were 
difficult to diagnose, including invalid gossip state, etc.

Replace_address should be more robust in this scenario. It would be much more 
user friendly if the replace_address logic would first do some basic sanity 
checks, possibly to include:

- Pinging the other node to see if it is indeed “down”, if the address is 
different than all local interface addresses
- Checking gossip state of the node to verify that it is not known to peers.

It may even be safest to require that both address reachability and gossip 
state are required to show the replace_address as down by default before 
allowing any token migration or other replace_address actions to occur.

In the case that the replace_address is not ready to be replaced, the log 
should indicate that you are trying to replace an active node, and cassandra 
should refuse to start.


> Replace_address should sanity check prior node state before migrating tokens
> 
>
> Key: CASSANDRA-11688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11688
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Shook
>
> During a node replacement, a replace_address was used which was associated 
> with a different node than the intended one. The result was that both nodes 
> remained active after the node came up. This caused several other issues 
> which were difficult to diagnose, including invalid gossip state, etc.
> Replace_address should be more robust in this scenario. It would be much more 
> user friendly if the replace_address logic would first do some basic sanity 
> checks, possibly to include:
> - Pinging the other node to see if it is indeed “down”, if the address is 
> different than all local interface addresses
> - Checking gossip state of the node to verify that it is not known to peers.
> It may even be safest to require that both address reachability and gossip 
> state are required to show the replace_address as down by default before 
> allowing any token migration or other replace_address actions to occur.
> In the case that the replace_address is not ready to be replaced, the log 
> should indicate that you are trying to replace an active node, and cassandra 
> should refuse to start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11688) Replace_address should sanity check prior node state before migrating tokens

2016-04-29 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-11688:
--

 Summary: Replace_address should sanity check prior node state 
before migrating tokens
 Key: CASSANDRA-11688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11688
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Shook


During a node replacement, a customer used an ip address associated with a 
different node than the intended one. The result was that both nodes remained 
active after the node came up. This caused several other issues which were 
difficult to diagnose, including invalid gossip state, etc.

Replace_address should be more robust in this scenario. It would be much more 
user friendly if the replace_address logic would first do some basic sanity 
checks, possibly to include:

- Pinging the other node to see if it is indeed “down”, if the address is 
different than all local interface addresses
- Checking gossip state of the node to verify that it is not known to peers.

It may even be safest to require that both address reachability and gossip 
state are required to show the replace_address as down by default before 
allowing any token migration or other replace_address actions to occur.

In the case that the replace_address is not ready to be replaced, the log 
should indicate that you are trying to replace an active node, and cassandra 
should refuse to start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216688#comment-15216688
 ] 

Jonathan Shook commented on CASSANDRA-9666:
---

There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be complete bypassed, given the right settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?






> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216688#comment-15216688
 ] 

Jonathan Shook edited comment on CASSANDRA-9666 at 3/29/16 7:21 PM:


There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be completely bypassed, given the right 
settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?







was (Author: jshook):
There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be complete bypassed, given the right settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?






> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> 

[jira] [Updated] (CASSANDRA-11408) simple compaction defaults for common scenarios

2016-03-22 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-11408:
---
Description: 
As compaction strategies get more flexible over time, some users might prefer 
to have a simple named profile for their settings.

{code:title=example, syntax variant|borderStyle=solid}
alter table foo.bar with compaction = 'timeseries-hourly-for-a-week';
{code}

{code:title=example, syntax variant |borderStyle=solid}
alter table foo.bar with compaction = { 'profile' : 'key-value-balanced-ops' };
{code}

These would simply be a map into sets of well-tested and documented defaults 
across any of the core compaction strategies.

This would simplify setting up compaction for well-understood workloads, but 
still allow for customization where desired.



  was:
As compaction strategies get more flexible over time, some users might prefer 
to have a simple named profile for their settings.

For example,
alter table foo.bar with compaction = 'timeseries-hourly-for-a-week';
or, with slightly different syntax:
alter table foo.bar with compaction = { 'profile' : 'key-value-balanced-ops' };

These would simply be a map into sets of well-tested and documented defaults 
across any of the core compaction strategies.

This would simplify setting up compaction for well-understood workloads, but 
still allow for customization where desired.




> simple compaction defaults for common scenarios
> ---
>
> Key: CASSANDRA-11408
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11408
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Shook
>
> As compaction strategies get more flexible over time, some users might prefer 
> to have a simple named profile for their settings.
> {code:title=example, syntax variant|borderStyle=solid}
> alter table foo.bar with compaction = 'timeseries-hourly-for-a-week';
> {code}
> {code:title=example, syntax variant |borderStyle=solid}
> alter table foo.bar with compaction = { 'profile' : 'key-value-balanced-ops' 
> };
> {code}
> These would simply be a map into sets of well-tested and documented defaults 
> across any of the core compaction strategies.
> This would simplify setting up compaction for well-understood workloads, but 
> still allow for customization where desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11408) simple compaction defaults for common scenarios

2016-03-22 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-11408:
--

 Summary: simple compaction defaults for common scenarios
 Key: CASSANDRA-11408
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11408
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Shook


As compaction strategies get more flexible over time, some users might prefer 
to have a simple named profile for their settings.

For example,
alter table foo.bar with compaction = 'timeseries-hourly-for-a-week';
or, with slightly different syntax:
alter table foo.bar with compaction = { 'profile' : 'key-value-balanced-ops' };

These would simply be a map into sets of well-tested and documented defaults 
across any of the core compaction strategies.

This would simplify setting up compaction for well-understood workloads, but 
still allow for customization where desired.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10425) Autoselect GC settings depending on system memory

2016-01-12 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095515#comment-15095515
 ] 

Jonathan Shook edited comment on CASSANDRA-10425 at 1/13/16 3:05 AM:
-

I think we should try to come up with a way of handling settings which one 
would choose differently for a new install. Settings like this will live 
forever without a better approach. I agree entirely with the principle of least 
surprise. However, according to this default, there will be new systems 
deployed in 2020 with CMS. There has to be a better way.

If we were able to have an install mode which would honor previous settings or 
take new defaults that are more desirable for current code and systems, perhaps 
we can avoid  the CMS in 2020 problem. Installers may require a user to specify 
a mode in order to make this truly unsurprising. If I were installing a new 
cluster in 2020, I would be quite surprised to find it running CMS.

Also, the point of having the settings be size-specific is to avoid surprising 
performance deficiencies. This is the kind of change that I would expect to go 
with a major version upgrade. 

So, to follow the principle of least surprise, perhaps we need to consider 
making this possible for those who expect to be able to use more than 32GB with 
G1 to address GC bandwidth and pause issues for heavy workloads, as we've come 
to expect through field experience. Otherwise, we'll be manually rewiring this 
from now on for all but historic pizza-boxen.



was (Author: jshook):
I think we should try to come up with a way of handling settings which one 
would choose differently for a new install. Settings like this will live 
forever without a better approach. I agree entirely with the principle of least 
surprise. However, according to this default, there will be new systems 
deployed in 2020 with CMS. There has to be a better way.

If we were able to have an install mode which would honor previous settings or 
take new defaults that are more desirable for current code and systems, perhaps 
we can avoid  the CMS in 2020 problem. Installers may require a user to specify 
a mode in order to make this truly unsurprising. If I were installing a new 
cluster in 2020, I would be quite surprised to find it running CMS.

Also, the point of having the settings be size-specific is to avoid surprising 
performance deficiencies. This is the kind of change that I would expect to go 
with a major version. 

So, to follow the principle of least surprise, perhaps we need to consider 
making this possible for those who expect to be able to use more than 32GB with 
G1 to address GC bandwidth and pause issues for heavy workloads, as we've come 
to expect through field experience. Otherwise, we'll be manually rewiring this 
from now on for all but historic pizza-boxen.


> Autoselect GC settings depending on system memory
> -
>
> Key: CASSANDRA-10425
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10425
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jonathan Shook
>
> 1) Make GC modular within cassandra-env
> 2) For systems with 32GB or less of ram, use the classic CMS with the 
> established default settings.
> 3) For systems with 48GB or more of ram, use 1/2 or up to 32GB of heap with 
> G1, whichever is lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11006) Allow upgrades and installs to take modern defaults

2016-01-12 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-11006:
--

 Summary: Allow upgrades and installs to take modern defaults
 Key: CASSANDRA-11006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11006
 Project: Cassandra
  Issue Type: Improvement
  Components: Configuration, Lifecycle, Packaging, Tools
Reporter: Jonathan Shook


See CASSANDRA-10425 for background.

We simply need to provide a way to install or upgrade C* on a system with 
modern settings. Keeping the previous defaults has been the standard rule of 
thumb to avoid surprises. This is a reasonable approach, but we haven't yet 
provided an alternative for full upgrades with new default nor for more 
appropriate installs of new systems. The number of previous defaults which may 
need to be modified for a saner deployment has become a form of technical 
baggage. Often, users will have to micro-manage basic settings to more 
reasonable defaults for every single deployment, upgrade or not. This is 
surprising.

For newer settings that would be more appropriate, we could force the user to 
make a choice. If you are installing a new cluster or node, you may want the 
modern defaults. If you are upgrading an existing node, you may still want the 
modern defaults. If you are upgrading an existing node and have some very 
carefully selected tunings for your hardware, then you may want to keep them. 
Even then, they may be worse than the modern defaults, given version changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11006) Allow upgrades and installs to take modern defaults

2016-01-12 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095532#comment-15095532
 ] 

Jonathan Shook edited comment on CASSANDRA-11006 at 1/13/16 3:34 AM:
-

The difference in the original ticket CASSANDRA-10425 was not that we were 
opting into auto-tuning. The difference was simply that we could take into 
consideration more contemporary hardware that is being deployed, including the 
trending size of RAM. I would generally expect that auto-tuning settings like 
this could be adapted for major versions, and added to the release notes like 
other potentially surprising, yet generally useful changes. If this is not the 
case for GC settings, then how do we allow for the change for CMS to G1 as 
average RAM sizing continues to change?



was (Author: jshook):
The difference in the original ticket CASSANDRA-10425 was not that we were 
opting into auto-tuning. The difference was simply that we could take account 
of more contemporary hardware that is being deployed presently, including the 
trending size of RAM. I would generally expect that auto-tuning settings like 
this could be adapted for major versions, and added to the release notes like 
other potentially surprising, yet generally useful changes. If this is not the 
case for GC settings, then how do we allow for the change for CMS to G1 as 
average RAM sizing continues to change?


> Allow upgrades and installs to take modern defaults
> ---
>
> Key: CASSANDRA-11006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11006
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration, Lifecycle, Packaging, Tools
>Reporter: Jonathan Shook
>
> See CASSANDRA-10425 for background.
> We simply need to provide a way to install or upgrade C* on a system with 
> modern settings. Keeping the previous defaults has been the standard rule of 
> thumb to avoid surprises. This is a reasonable approach, but we haven't yet 
> provided an alternative for full upgrades with new default nor for more 
> appropriate installs of new systems. The number of previous defaults which 
> may need to be modified for a saner deployment has become a form of technical 
> baggage. Often, users will have to micro-manage basic settings to more 
> reasonable defaults for every single deployment, upgrade or not. This is 
> surprising.
> For newer settings that would be more appropriate, we could force the user to 
> make a choice. If you are installing a new cluster or node, you may want the 
> modern defaults. If you are upgrading an existing node, you may still want 
> the modern defaults. If you are upgrading an existing node and have some very 
> carefully selected tunings for your hardware, then you may want to keep them. 
> Even then, they may be worse than the modern defaults, given version changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10425) Autoselect GC settings depending on system memory

2016-01-12 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095515#comment-15095515
 ] 

Jonathan Shook commented on CASSANDRA-10425:


I think we should try to come up with a way of handling settings which one 
would choose differently for a new install. Settings like this will live 
forever without a better approach. I agree entirely with the principle of least 
surprise. However, according to this default, there will be new systems 
deployed in 2020 with CMS. There has to be a better way.

If we were able to have an install mode which would honor previous settings or 
take new defaults that are more desirable for current code and systems, perhaps 
we can avoid  the CMS in 2020 problem. Installers may require a user to specify 
a mode in order to make this truly unsurprising. If I were installing a new 
cluster in 2020, I would be quite surprised to find it running CMS.

Also, the point of having the settings be size-specific is to avoid surprising 
performance deficiencies. This is the kind of change that I would expect to go 
with a major version. 

So, to follow the principle of least surprise, perhaps we need to consider 
making this possible for those who expect to be able to use more than 32GB with 
G1 to address GC bandwidth and pause issues for heavy workloads, as we've come 
to expect through field experience. Otherwise, we'll be manually rewiring this 
from now on for all but historic pizza-boxen.


> Autoselect GC settings depending on system memory
> -
>
> Key: CASSANDRA-10425
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10425
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jonathan Shook
>
> 1) Make GC modular within cassandra-env
> 2) For systems with 32GB or less of ram, use the classic CMS with the 
> established default settings.
> 3) For systems with 48GB or more of ram, use 1/2 or up to 32GB of heap with 
> G1, whichever is lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11006) Allow upgrades and installs to take modern defaults

2016-01-12 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095532#comment-15095532
 ] 

Jonathan Shook commented on CASSANDRA-11006:


The difference in the original ticket CASSANDRA-10425 was not that we were 
opting into auto-tuning. The difference was simply that we could take account 
of more contemporary hardware that is being deployed presently, including the 
trending size of RAM. I would generally expect that auto-tuning settings like 
this could be adapted for major versions, and added to the release notes like 
other potentially surprising, yet generally useful changes. If this is not the 
case for GC settings, then how do we allow for the change for CMS to G1 as 
average RAM sizing continues to change?


> Allow upgrades and installs to take modern defaults
> ---
>
> Key: CASSANDRA-11006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11006
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration, Lifecycle, Packaging, Tools
>Reporter: Jonathan Shook
>
> See CASSANDRA-10425 for background.
> We simply need to provide a way to install or upgrade C* on a system with 
> modern settings. Keeping the previous defaults has been the standard rule of 
> thumb to avoid surprises. This is a reasonable approach, but we haven't yet 
> provided an alternative for full upgrades with new default nor for more 
> appropriate installs of new systems. The number of previous defaults which 
> may need to be modified for a saner deployment has become a form of technical 
> baggage. Often, users will have to micro-manage basic settings to more 
> reasonable defaults for every single deployment, upgrade or not. This is 
> surprising.
> For newer settings that would be more appropriate, we could force the user to 
> make a choice. If you are installing a new cluster or node, you may want the 
> modern defaults. If you are upgrading an existing node, you may still want 
> the modern defaults. If you are upgrading an existing node and have some very 
> carefully selected tunings for your hardware, then you may want to keep them. 
> Even then, they may be worse than the modern defaults, given version changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10425) Autoselect GC settings depending on system memory

2016-01-12 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095534#comment-15095534
 ] 

Jonathan Shook commented on CASSANDRA-10425:


CASSANDRA-11006 was created to discuss possible ways of handling this.


> Autoselect GC settings depending on system memory
> -
>
> Key: CASSANDRA-10425
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10425
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jonathan Shook
>
> 1) Make GC modular within cassandra-env
> 2) For systems with 32GB or less of ram, use the classic CMS with the 
> established default settings.
> 3) For systems with 48GB or more of ram, use 1/2 or up to 32GB of heap with 
> G1, whichever is lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10742) Real world DateTieredCompaction tests

2015-11-24 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024539#comment-15024539
 ] 

Jonathan Shook commented on CASSANDRA-10742:


[~krummas],

Some notes on test setup, and some observations from data models we've seen. We 
can try to get some additional details from willing users if this doesn't get 
us close enough.

The baseline test I use is high-ingest, read-most-recent, with some read-cold 
mixed-in. The idea is to simulate the typical access patterns of time-series 
telemetry with roll-up processing, with the occasional historic query or 
reprocessing of old data. I use 90/10/1 ratio for write/recent-read/cold-read 
as a starting point. I usually back off the ingest rate from a saturating load 
in order to find a stable steady-state reference point. This still is much 
higher load per-node than you would often have in a production scenario. It 
does provide for good contrast with trade-offs, like compaction load. Often, 
you will be accumulating data over a longer period of time, so ingest rates 
that approach the reasonable saturating load are closer to stress tests than 
real-world. As such, they are still good tests. If you can run a node at 10x to 
1000x the data rates that you would expect in production, then 1) you can 
complete the test in a reasonable amount of time and 2) you're not too worried 
about the margin of error.

The data model I use is essentially ((datasource, timebucket), parametername, 
timestamp) -> value, although future testing will likely drop the timebucket 
component, relying instead on the time-based layout of sstables as a 
simplification. (Still needs supporting data from tests). parametername is just 
a variable name that is associated with a type of measurement. This is selected 
from a fixed set, as is often the case in the wild. The value can vary in type 
and size according to the type of data logging. I use a range from 1k to 5k, 
depending on the type of test. In the simplest cases, a value is an int or 
float, but it can also be a log line from a stack trace.

The model of writes/read-most-recent/read-cold can cover lots of ground in 
terms of time-series. The ratios can be varied. Also, the number of partitions 
per node in conjunction with the number of parameters should vary. In some 
cases in the wild, time-series partitions are single-series. In other cases, 
they can have hundreds of related series by name (by cluster). In some cases, 
the parameters associated with a data source are distributed by partition to 
support async loading the cluster for responsive reads of significant data. To 
cover this, simply move the parenthesis right by one term above.

If you cover some of the permutations above for op ratios, clustering 
structure, grain of partition, and payload size, then you'll be covering lots 
of the space we see in practice.


> Real world DateTieredCompaction tests
> -
>
> Key: CASSANDRA-10742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10742
> Project: Cassandra
>  Issue Type: Test
>Reporter: Marcus Eriksson
>
> So, to be able to actually evaluate DTCS (or TWCS) we need stress profiles 
> that are similar to something that could be found in real production systems.
> We should then run these profiles for _weeks_, and do regular operational 
> tasks on the cluster - like bootstrap, decom, repair etc.
> [~jjirsa] [~jshook] (or anyone): could you describe any write/read patterns 
> you have seen people use with DTCS in production?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-10-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951568#comment-14951568
 ] 

Jonathan Shook commented on CASSANDRA-10403:


Anecdote: https://www.youtube.com/watch?v=1R-mgOcOSd4=youtu.be=24m27s

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10495) Improve the way we do streaming with vnodes

2015-10-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950983#comment-14950983
 ] 

Jonathan Shook commented on CASSANDRA-10495:


What if the streaming protocol were enhanced to allow sending nodes to provide 
an offer manifest, blocking until the receiver responded with a preferred 
ordering and grouping. Does this help address any of the planning issues better?

> Improve the way we do streaming with vnodes
> ---
>
> Key: CASSANDRA-10495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10495
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
> Fix For: 3.x
>
>
> Streaming with vnodes usually creates a large amount of sstables on the 
> target node - for example if each source node has 100 sstables and we use 
> num_tokens = 256, the bootstrapping (for example) node might get 100*256 
> sstables
> One approach could be to do an on-the-fly compaction on the source node, 
> meaning we would only stream out one sstable per range. Note that we will 
> want the compaction strategy to decide how to combine the sstables, for 
> example LCS will not want to mix sstables from different levels while STCS 
> can probably just combine everything
> cc [~yukim]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10489) arbitrary order by on partitions

2015-10-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949492#comment-14949492
 ] 

Jonathan Shook commented on CASSANDRA-10489:


So, against a non-indexed field, the processing bound will be the size of the 
partition. If you only hold a scoreboard of limit items in memory and stream 
through the rest, replacing items, the memory requirements are lower, but the 
IO requirements could be substantial. If you do this with RF>1 and CL>1, then 
you may have semantics of result merging at the coordinator, but this should 
still be bounded to the result size and not the search space.

I would like for us to consider this operation for indexed fields and 
non-indexed fields as separate features, possibly putting the non-indexed 
version behind a warning or such. I'm sure some will absolutely try to sort 
10^9 items with limit 10. At least they should know that it has a completely 
different op cost.


> arbitrary order by on partitions
> 
>
> Key: CASSANDRA-10489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10489
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jon Haddad
>Priority: Minor
>
> We've got aggregations, we might as well allow sorting rows within a 
> partition on arbitrary fields.  Currently the advice is "do it client side", 
> but when combined with a LIMIT clause it makes sense do this server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10489) arbitrary order by on partitions

2015-10-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949436#comment-14949436
 ] 

Jonathan Shook commented on CASSANDRA-10489:


Would this need to be limited to indexed (in some form) fields? Without an 
index, it would be difficult for the coordinator to know the bound of sorting 
ahead of time. Or would this be for rows selected by some indexed field with 
limit, and then sorted only after limit was applied?

Essentially, should we define this as a valid goal for results for which we 
already can know the cardinality bounds without traversing the whole partition?

> arbitrary order by on partitions
> 
>
> Key: CASSANDRA-10489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10489
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jon Haddad
>Priority: Minor
>
> We've got aggregations, we might as well allow sorting rows within a 
> partition on arbitrary fields.  Currently the advice is "do it client side", 
> but when combined with a LIMIT clause it makes sense do this server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10490) DTCS historic compaction, possibly with major compaction

2015-10-08 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-10490:
--

 Summary: DTCS historic compaction, possibly with major compaction
 Key: CASSANDRA-10490
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10490
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Shook


Presently, it's simply painful to run a major compaction with DTCS. It doesn't 
really serve a useful purpose. Instead, a DTCS major compaction should allow 
for compaction to go back before max_sstable_age_days. We can call this a 
historic compaction, for lack of a better term.

Such a compaction should not take precedence over normal compaction work, but 
should be considered a background task. By default there should be a cap on the 
number of these tasks running. It would be nice to have a separate 
"max_historic_compaction_tasks" and possibly a 
"max_historic_compaction_throughput" in the compaction settings to allow for 
separate throttles on this. I would set these at 1 and 20% of the usual 
compaction throughput if they aren't set explicitly.

It may also be desirable to allow historic compaction to run apart from running 
a major compaction, and to simply disable major compaction altogether for DTCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10489) arbitrary order by on partitions

2015-10-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949492#comment-14949492
 ] 

Jonathan Shook edited comment on CASSANDRA-10489 at 10/8/15 10:14 PM:
--

So, against a non-indexed field, the processing bound will be the size of the 
partition. If you only hold a scoreboard of limit items in memory and stream 
through the rest, replacing items, the memory requirements are lower, but the 
IO requirements could be substantial. If you do this with RF>1 and CL>1, then 
you may have semantics of result merging at the coordinator, but this should 
still be bounded to the result size and not the search space.

I would like for us to consider this operation for indexed fields and 
non-indexed fields as separate features, possibly putting the non-indexed 
version behind a warning or such. I'm sure some will absolutely try to sort 
10^9 *unindexed* items with limit 10. At least they should know that it has a 
completely different op cost.



was (Author: jshook):
So, against a non-indexed field, the processing bound will be the size of the 
partition. If you only hold a scoreboard of limit items in memory and stream 
through the rest, replacing items, the memory requirements are lower, but the 
IO requirements could be substantial. If you do this with RF>1 and CL>1, then 
you may have semantics of result merging at the coordinator, but this should 
still be bounded to the result size and not the search space.

I would like for us to consider this operation for indexed fields and 
non-indexed fields as separate features, possibly putting the non-indexed 
version behind a warning or such. I'm sure some will absolutely try to sort 
10^9 items with limit 10. At least they should know that it has a completely 
different op cost.


> arbitrary order by on partitions
> 
>
> Key: CASSANDRA-10489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10489
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jon Haddad
>Priority: Minor
>
> We've got aggregations, we might as well allow sorting rows within a 
> partition on arbitrary fields.  Currently the advice is "do it client side", 
> but when combined with a LIMIT clause it makes sense do this server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10490) DTCS historic compaction, possibly with major compaction

2015-10-08 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-10490:
---
Description: 
Presently, it's simply painful to run a major compaction with DTCS. It doesn't 
really serve a useful purpose. Instead, a DTCS major compaction should allow 
for a DTCS-style compaction to go back before max_sstable_age_days. We can call 
this a historic compaction, for lack of a better term.

Such a compaction should not take precedence over normal compaction work, but 
should be considered a background task. By default there should be a cap on the 
number of these tasks running. It would be nice to have a separate 
"max_historic_compaction_tasks" and possibly a 
"max_historic_compaction_throughput" in the compaction settings to allow for 
separate throttles on this. I would set these at 1 and 20% of the usual 
compaction throughput if they aren't set explicitly.

It may also be desirable to allow historic compaction to run apart from running 
a major compaction, and to simply disable major compaction altogether for DTCS.

  was:
Presently, it's simply painful to run a major compaction with DTCS. It doesn't 
really serve a useful purpose. Instead, a DTCS major compaction should allow 
for compaction to go back before max_sstable_age_days. We can call this a 
historic compaction, for lack of a better term.

Such a compaction should not take precedence over normal compaction work, but 
should be considered a background task. By default there should be a cap on the 
number of these tasks running. It would be nice to have a separate 
"max_historic_compaction_tasks" and possibly a 
"max_historic_compaction_throughput" in the compaction settings to allow for 
separate throttles on this. I would set these at 1 and 20% of the usual 
compaction throughput if they aren't set explicitly.

It may also be desirable to allow historic compaction to run apart from running 
a major compaction, and to simply disable major compaction altogether for DTCS.


> DTCS historic compaction, possibly with major compaction
> 
>
> Key: CASSANDRA-10490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10490
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jonathan Shook
>
> Presently, it's simply painful to run a major compaction with DTCS. It 
> doesn't really serve a useful purpose. Instead, a DTCS major compaction 
> should allow for a DTCS-style compaction to go back before 
> max_sstable_age_days. We can call this a historic compaction, for lack of a 
> better term.
> Such a compaction should not take precedence over normal compaction work, but 
> should be considered a background task. By default there should be a cap on 
> the number of these tasks running. It would be nice to have a separate 
> "max_historic_compaction_tasks" and possibly a 
> "max_historic_compaction_throughput" in the compaction settings to allow for 
> separate throttles on this. I would set these at 1 and 20% of the usual 
> compaction throughput if they aren't set explicitly.
> It may also be desirable to allow historic compaction to run apart from 
> running a major compaction, and to simply disable major compaction altogether 
> for DTCS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10489) arbitrary order by on partitions

2015-10-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949613#comment-14949613
 ] 

Jonathan Shook commented on CASSANDRA-10489:


I'm totally cool with a threshold warning here. But something that is easily 
ignored is easily ignored, like log spam. Also, if it is documented clearly in 
terms of op costs, I'm ok with that too. Anywhere we have a list of "these 
things that can be expensive if you don't understand what they are doing", this 
should be on it.

> arbitrary order by on partitions
> 
>
> Key: CASSANDRA-10489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10489
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jon Haddad
>Priority: Minor
>
> We've got aggregations, we might as well allow sorting rows within a 
> partition on arbitrary fields.  Currently the advice is "do it client side", 
> but when combined with a LIMIT clause it makes sense do this server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10443) CQLSStableWriter example fails on 3.0rc1

2015-10-03 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-10443:
---
Summary: CQLSStableWriter example fails on 3.0rc1  (was: CQLSStableWriter 
example fails on C*3.0)

> CQLSStableWriter example fails on 3.0rc1
> 
>
> Key: CASSANDRA-10443
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10443
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, Tools
>Reporter: Jonathan Shook
>
> CQLSSTableWriter which works with 2.2.1 does not work with 3.0rc1.
> Something like https://github.com/yukim/cassandra-bulkload-example should be 
> added to the test suite.
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter.close(SSTableSimpleUnsortedWriter.java:136)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:274)
>   at com.metawiring.sandbox.BulkLoadExample.main(BulkLoadExample.java:160)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> Caused by: java.lang.ExceptionInInitializerError
>   at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:372)
>   at org.apache.cassandra.db.Keyspace.(Keyspace.java:309)
>   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:133)
>   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
>   at 
> org.apache.cassandra.io.sstable.SSTableTxnWriter.create(SSTableTxnWriter.java:97)
>   at 
> org.apache.cassandra.io.sstable.AbstractSSTableSimpleWriter.createWriter(AbstractSSTableSimpleWriter.java:63)
>   at 
> org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter$DiskWriter.run(SSTableSimpleUnsortedWriter.java:206)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1153)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:116)
>   ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10443) CQLSStableWriter example fails on C*3.0

2015-10-03 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-10443:
--

 Summary: CQLSStableWriter example fails on C*3.0
 Key: CASSANDRA-10443
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10443
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
Reporter: Jonathan Shook


CQLSSTableWriter which works with 2.2.1 does not work with 3.0rc1.
Something like https://github.com/yukim/cassandra-bulkload-example should be 
added to the test suite.

Exception in thread "main" java.lang.RuntimeException: 
java.lang.ExceptionInInitializerError
at 
org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter.close(SSTableSimpleUnsortedWriter.java:136)
at 
org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:274)
at com.metawiring.sandbox.BulkLoadExample.main(BulkLoadExample.java:160)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ExceptionInInitializerError
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:372)
at org.apache.cassandra.db.Keyspace.(Keyspace.java:309)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:133)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at 
org.apache.cassandra.io.sstable.SSTableTxnWriter.create(SSTableTxnWriter.java:97)
at 
org.apache.cassandra.io.sstable.AbstractSSTableSimpleWriter.createWriter(AbstractSSTableSimpleWriter.java:63)
at 
org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter$DiskWriter.run(SSTableSimpleUnsortedWriter.java:206)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.config.DatabaseDescriptor.getFlushWriters(DatabaseDescriptor.java:1153)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:116)
... 7 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938223#comment-14938223
 ] 

Jonathan Shook commented on CASSANDRA-10403:


[~JoshuaMcKenzie] I'd prefer not to make too many assumptions about 
confirmation or (human) memory bias on this. We will not get off this carousel 
without actual data. However, to the degree that you are right about it, it 
should encourage us to explore further, not less. CMS's pain in those cases has 
much to do with its inability to scale with hardware sizing and concurrency 
trends, which we seem to be working really hard to disregard. Until someone 
puts together a view of current and emerging system parameters, we really don't 
have the data that we need to set a default.

I posit that the general case system is much bigger in practice that in the 
past. I also posit that on those systems, G1 is an obviously better default 
than CMS. So, we are likely going to get some data on 1) what the hardware data 
looks like in the field and 2) whether or not we can demonstrate the CMS 
improvements with larger memory that we've seen with *actual workloads* on 
*current system profiles*. I'm simply eager to see more data at this point.

This is a bit out of scope of the ticket, but it is important. If we were able 
to set a default depending on the available memory, there would not be a single 
default. Trying to scale GC bandwidth up on bigger metal with CMS is arguably 
more painful than trying to make G1 useable with lower memory. However, we 
don't have to make that bargain as either-or. We can have the best of both, if 
we simply align the GC settings to the type of hardware that they work well for.

I'll create another ticket for that.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938237#comment-14938237
 ] 

Jonathan Shook commented on CASSANDRA-10403:


I created CASSANDRA-10425 to discuss the per-size defaults.


> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938336#comment-14938336
 ] 

Jonathan Shook edited comment on CASSANDRA-10403 at 9/30/15 7:28 PM:
-

[~JoshuaMcKenzie]
I understand and appreciate the need to control scoping effort for 3.0 planning.

bq. Shouldn't the read/write workload distribution also play into that?

Yes, but there is a mostly orthogonal effect to the nuances of the workload mix 
which has to do with the vertical scalability of GC when the system is more 
fully utilized. This is visible along the sizing spectrum. Run the same 
workload and try to scale the heap proportionally over the memory (1/4 or 
whatever) and you will likely see CMS suffer no matter what. This is slightly 
conjectural, but easily verifiable with some effort.

bq.  the idea of having a default that's optimal for everyone is unrealistic

I think we are converging on a common perspective on this.

[~slebresne]
bq. 3.2 will come only 2 months after 3.0

My preference would be to have the CASSANDRA-10425 out of the gate, but this 
still would require some testing effort for safety. The reason being that 3.0 
represents a reframing of performance expectations, and after that, any changes 
to default, even for larger memory systems constitute a bigger chance of 
surprise. Do we have a chance to learn about sizing from surveys, etc before 
the runway ends for 3.0?

If we could get something like CASSANDRA-10425 in place, it would cover both 
bases.



was (Author: jshook):
[~JoshuaMcKenzie]
I understand and appreciate the need to control scoping effort for 3.0 planning.

bq. Shouldn't the read/write workload distribution also play into that?

Yes, but there is a mostly orthogonal effect to the nuances of the workload mix 
which has to do with the vertical scalability of GC when the system. This is 
visible along the sizing spectrum. Run the same workload and try to scale the 
heap proportionally over the memory (1/4 or whatever) and you will likely see 
CMS suffer no matter what. This is slightly conjectural, but easily verifiable 
with some effort.

bq.  the idea of having a default that's optimal for everyone is unrealistic

I think we are converging on a common perspective on this.

[~slebresne]
bq. 3.2 will come only 2 months after 3.0

My preference would be to have the CASSANDRA-10425 out of the gate, but this 
still would require some testing effort for safety. The reason being that 3.0 
represents a reframing of performance expectations, and after that, any changes 
to default, even for larger memory systems constitute a bigger chance of 
surprise. Do we have a chance to learn about sizing from surveys, etc before 
the runway ends for 3.0?

If we could get something like CASSANDRA-10425 in place, it would cover both 
bases.


> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938336#comment-14938336
 ] 

Jonathan Shook commented on CASSANDRA-10403:


[~JoshuaMcKenzie]
I understand and appreciate the need to control scoping effort for 3.0 planning.

bq. Shouldn't the read/write workload distribution also play into that?

Yes, but there is a mostly orthogonal effect to the nuances of the workload mix 
which has to do with the vertical scalability of GC when the system. This is 
visible along the sizing spectrum. Run the same workload and try to scale the 
heap proportionally over the memory (1/4 or whatever) and you will likely see 
CMS suffer no matter what. This is slightly conjectural, but easily verifiable 
with some effort.

bq.  the idea of having a default that's optimal for everyone is unrealistic

I think we are converging on a common perspective on this.

[~slebresne]
bq. 3.2 will come only 2 months after 3.0

My preference would be to have the CASSANDRA-10425 out of the gate, but this 
still would require some testing effort for safety. The reason being that 3.0 
represents a reframing of performance expectations, and after that, any changes 
to default, even for larger memory systems constitute a bigger chance of 
surprise. Do we have a chance to learn about sizing from surveys, etc before 
the runway ends for 3.0?

If we could get something like CASSANDRA-10425 in place, it would cover both 
bases.


> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10425) Autoselect GC settings depending on system memory

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938236#comment-14938236
 ] 

Jonathan Shook commented on CASSANDRA-10425:


Consider adding some weightings for different levels of buffer-cache 
sensitivity in workload.

> Autoselect GC settings depending on system memory
> -
>
> Key: CASSANDRA-10425
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10425
> Project: Cassandra
>  Issue Type: Bug
>  Components: Config, Core
>Reporter: Jonathan Shook
>
> 1) Make GC modular within cassandra-env
> 2) For systems with 32GB or less of ram, use the classic CMS with the 
> established default settings.
> 3) For systems with 48GB or more of ram, use 1/2 or up to 32GB of heap with 
> G1, whichever is lower.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10425) Autoselect GC settings depending on system memory

2015-09-30 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-10425:
--

 Summary: Autoselect GC settings depending on system memory
 Key: CASSANDRA-10425
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10425
 Project: Cassandra
  Issue Type: Bug
  Components: Config, Core
Reporter: Jonathan Shook


1) Make GC modular within cassandra-env
2) For systems with 32GB or less of ram, use the classic CMS with the 
established default settings.
3) For systems with 48GB or more of ram, use 1/2 or up to 32GB of heap with G1, 
whichever is lower.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938731#comment-14938731
 ] 

Jonathan Shook edited comment on CASSANDRA-10403 at 9/30/15 7:45 PM:
-

To simplify, implementing CASSANDRA-10425 is effectively the same as reverting 
for the systems that we have commonly tested for, while allowing a likely 
better starting point for those that we have field experience with G1.


was (Author: jshook):
To simplify, implementing CASSANDRA-10425 is effectively the same as reverting 
for the system that we have tested for, while allowing a likely better starting 
point for those that we have field experience with G1.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938731#comment-14938731
 ] 

Jonathan Shook commented on CASSANDRA-10403:


To simplify, implementing CASSANDRA-10425 is effectively the same as reverting 
for the system that we have tested for, while allowing a likely better starting 
point for those that we have field experience with G1.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937449#comment-14937449
 ] 

Jonathan Shook commented on CASSANDRA-10403:


So, just to be clear, We are we disregarding G1 for systems with larger memory 
with the assumption that 8GB is all you'll ever need for "all but the most 
write-heavy workoads", even for system that have larger memory ??


> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937456#comment-14937456
 ] 

Jonathan Shook commented on CASSANDRA-10403:


[~pauloricardomg] I understand, with your updated comment.
For systems that can't support a larger heap, CMS is fine, as long as you don't 
mind saturating survivor and triggering the cascade of GC-induced side-effects. 
Still, this is a performance trade-off with resiliency.

I want to be clear that I think it would be a loss for us to just disregard G1 
for larger memory systems as the general case. There seems to be some tension 
between the actual field experience and prognostication as to how it should 
work. I would like for data to lead the way on this, as it should.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937039#comment-14937039
 ] 

Jonathan Shook commented on CASSANDRA-10403:


This statement carries certain assumptions about the whole system, which may 
not be fair across the board. For example, buffer cache is a critical 
consideration, but to a varying degree depending on how cache-friendly the 
workload is. Further, the storage subsystem determines a very large part of how 
much of a cache-miss penalty there is. So, prioritizing the cache at the 
expense of the heap is not a sure win. Often it is not the right balance.

With system that have high concurrency, it is possible to scale up the 
performance on the node as long as you can provide reasonable tunings to 
effectively take advantage of available resources without critically 
bottle-necking on one. For example, with systems that have higher effective IO 
concurrency and IO bandwidth across many devices, you actually need higher GC 
throughput in order to match the overall IO capacity of the system, from 
storage subsystem all the way to the network stack.

This rationale has been evidenced in the field when we have made tuning 
improvements with G1 in certain systems as an opportunistic test. My 
explanation above is a probably a gross oversimplification, but it reflects 
experience addressing GC throughput (and pauses, and phi, and hints, and load 
shifting ... etc) issues.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937041#comment-14937041
 ] 

Jonathan Shook commented on CASSANDRA-10403:


To be clear, in some cases, we found G1 to be a better production GC, and those 
tests simply allowed us to verify this before leaving it in place.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936132#comment-14936132
 ] 

Jonathan Shook commented on CASSANDRA-10403:


To be fair, m1.xlarge has less than 16GB of RAM, which still on the small side 
for G1 effectiveness, although at some point between 14G and 24G you should 
start seeing G1 provide more stability than CMS for GC saturating loads. 
(Assuming you don't set the GC pause target down too low)
G1 should start to be the obvious choice when you run with more than about 24GB 
and even more obviously with 32GB of heap. This might seem large, but if you 
look at what businesses tend to deploy in data centers for bare metal, they 
aren't just 32GB systems anymore. You'll often see 64, 128, or more GB of DRAM. 
There are some other ec2 profiles which get up to this range, but they are 
disproportionately more expensive.

So, tests that go up to 32G of heap on a system with 64GB of main memory are 
really where the proof points are. Saturating loads are good too.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936186#comment-14936186
 ] 

Jonathan Shook edited comment on CASSANDRA-10403 at 9/30/15 12:58 AM:
--

Note about memory sizes. Everything I wrote above assumes that we are talking 
about smaller heaps. Things clearly change when we go up in heap size beyond 
what CMS can handle well. (for those reading from the middle)


was (Author: jshook):
Note about memory sizes. Everything I wrote above assumes that we are talking 
about smaller heaps. Things clearly change when we go up in heap size beyond 
what CMS can handle well.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936186#comment-14936186
 ] 

Jonathan Shook commented on CASSANDRA-10403:


Note about memory sizes. Everything I wrote above assumes that we are talking 
about smaller heaps. Things clearly change when we go up in heap size beyond 
what CMS can handle well.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10419) Make JBOD compaction and flushing more robust

2015-09-29 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-10419:
--

 Summary: Make JBOD compaction and flushing more robust
 Key: CASSANDRA-10419
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10419
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Shook
 Attachments: timeseries-study-overview-jbods.png

With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
possible to run out of space prematurely. With a sufficient ingestion rate, 
disk selection logic seems to overselect on certain JBOD targets. This causes a 
premature C* shutdown when there is a significant amount of space left. With 
DTCS, for example, it should be possible to utilize over 90% of the available 
space with certain settings. However in the scenario I tested, only about 50% 
was utilized, before a filesystem full error. (see below). It is likely that 
this is a scheduling challenge between high rates of ingest and smaller data 
directories. It would be good to use an anticipatory model if possible to more 
carefully select compaction targets according to fill rates.

The attached image shows a test with 12 1.2TB JBOD data directories. At the 
end, the utilizations are:
59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
1.083TB, 1092TiB,  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10419) Make JBOD compaction and flushing more robust

2015-09-29 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-10419:
---
Description: 
With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
possible to run out of space prematurely. With a sufficient ingestion rate, 
disk selection logic seems to overselect on certain JBOD targets. This causes a 
premature C* shutdown when there is a significant amount of space left. With 
DTCS, for example, it should be possible to utilize over 90% of the available 
space with certain settings. However in the scenario I tested, only about 50% 
was utilized, before a filesystem full error. (see below). It is likely that 
this is a scheduling challenge between high rates of ingest and smaller data 
directories. It would be good to use an anticipatory model if possible to more 
carefully select compaction targets according to fill rates. As well, if the 
largest sstable that can be supported is constrained by the largest JBOD 
extent, we should make that visible to the compaction logic where possible.

The attached image shows a test with 12 1.2TB JBOD data directories. At the 
end, the utilizations are:
59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
1.083TB, 1092TiB,  

  was:
With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
possible to run out of space prematurely. With a sufficient ingestion rate, 
disk selection logic seems to overselect on certain JBOD targets. This causes a 
premature C* shutdown when there is a significant amount of space left. With 
DTCS, for example, it should be possible to utilize over 90% of the available 
space with certain settings. However in the scenario I tested, only about 50% 
was utilized, before a filesystem full error. (see below). It is likely that 
this is a scheduling challenge between high rates of ingest and smaller data 
directories. It would be good to use an anticipatory model if possible to more 
carefully select compaction targets according to fill rates.

The attached image shows a test with 12 1.2TB JBOD data directories. At the 
end, the utilizations are:
59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
1.083TB, 1092TiB,  


> Make JBOD compaction and flushing more robust
> -
>
> Key: CASSANDRA-10419
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10419
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Shook
> Attachments: timeseries-study-overview-jbods.png
>
>
> With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
> possible to run out of space prematurely. With a sufficient ingestion rate, 
> disk selection logic seems to overselect on certain JBOD targets. This causes 
> a premature C* shutdown when there is a significant amount of space left. 
> With DTCS, for example, it should be possible to utilize over 90% of the 
> available space with certain settings. However in the scenario I tested, only 
> about 50% was utilized, before a filesystem full error. (see below). It is 
> likely that this is a scheduling challenge between high rates of ingest and 
> smaller data directories. It would be good to use an anticipatory model if 
> possible to more carefully select compaction targets according to fill rates. 
> As well, if the largest sstable that can be supported is constrained by the 
> largest JBOD extent, we should make that visible to the compaction logic 
> where possible.
> The attached image shows a test with 12 1.2TB JBOD data directories. At the 
> end, the utilizations are:
> 59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
> 1.083TB, 1092TiB,  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936156#comment-14936156
 ] 

Jonathan Shook commented on CASSANDRA-10403:


I do think it is valid, however I expect the findings to be slightly different. 
The promise of G1 on smaller systems is more robust performance across a range 
of workloads without manual tuning. That said, it probably won't perform as 
well in terms of ops/s, etc. The question to me is really whether we are trying 
to save people from the pain of not going fast enough or whether we are trying 
to save them from the pain of a CMS once they start having cascading IO and 
heap pressure through the system. I am very curious about our tests proving 
this out as we would expect.

As an operator and a developer, I'd take an easily tuned and stable setting 
over one that goes fast until it doesn't go, any day. However, some will have 
already adjusted their cluster sizing around one expectation, so we'd want to 
make sure to avoid surprises. With 3.0 having other changes as well to offset, 
it might be a wash.

Raw performance is only part of the picture. I would like to see your results, 
for sure.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
>Assignee: Paulo Motta
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10419) Make JBOD compaction and flushing more robust

2015-09-29 Thread Jonathan Shook (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Shook updated CASSANDRA-10419:
---
Description: 
With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
possible to run out of space prematurely. With a sufficient ingestion rate, 
disk selection logic seems to overselect on certain JBOD targets. This causes a 
premature C* shutdown when there is a significant amount of space left. With 
DTCS, for example, it should be possible to utilize over 90% of the available 
space with certain settings. However in the scenario I tested, only about 50% 
was utilized, before a filesystem full error. (see below). It is likely that 
this is a scheduling challenge between high rates of ingest and smaller data 
directories. It would be good to use an anticipatory model if possible to more 
carefully select compaction targets according to fill rates. As well, if the 
largest sstable that can be supported is constrained by the largest JBOD 
extent, we should make that visible to the compaction logic where possible.

The attached image shows a test with 12 1.2TB JBOD data directories. At the 
end, the utilizations are:
59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
1.083TB, 1.092TiB,  

  was:
With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
possible to run out of space prematurely. With a sufficient ingestion rate, 
disk selection logic seems to overselect on certain JBOD targets. This causes a 
premature C* shutdown when there is a significant amount of space left. With 
DTCS, for example, it should be possible to utilize over 90% of the available 
space with certain settings. However in the scenario I tested, only about 50% 
was utilized, before a filesystem full error. (see below). It is likely that 
this is a scheduling challenge between high rates of ingest and smaller data 
directories. It would be good to use an anticipatory model if possible to more 
carefully select compaction targets according to fill rates. As well, if the 
largest sstable that can be supported is constrained by the largest JBOD 
extent, we should make that visible to the compaction logic where possible.

The attached image shows a test with 12 1.2TB JBOD data directories. At the 
end, the utilizations are:
59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
1.083TB, 1092TiB,  


> Make JBOD compaction and flushing more robust
> -
>
> Key: CASSANDRA-10419
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10419
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jonathan Shook
> Attachments: timeseries-study-overview-jbods.png
>
>
> With JBOD and several smaller disks, like SSDs at 1.2 TB or lower, it is 
> possible to run out of space prematurely. With a sufficient ingestion rate, 
> disk selection logic seems to overselect on certain JBOD targets. This causes 
> a premature C* shutdown when there is a significant amount of space left. 
> With DTCS, for example, it should be possible to utilize over 90% of the 
> available space with certain settings. However in the scenario I tested, only 
> about 50% was utilized, before a filesystem full error. (see below). It is 
> likely that this is a scheduling challenge between high rates of ingest and 
> smaller data directories. It would be good to use an anticipatory model if 
> possible to more carefully select compaction targets according to fill rates. 
> As well, if the largest sstable that can be supported is constrained by the 
> largest JBOD extent, we should make that visible to the compaction logic 
> where possible.
> The attached image shows a test with 12 1.2TB JBOD data directories. At the 
> end, the utilizations are:
> 59GiB, 83GiB, 83GiB, 97GiB, 330GiB, 589GiB, 604GiB, 630GiB, 697GiB, 1.055TiB, 
> 1.083TB, 1.092TiB,  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933621#comment-14933621
 ] 

Jonathan Shook commented on CASSANDRA-10403:


I would be entirely in favor of having a separate settings file that can simply 
be sourced in. Having several related GC options sprinkled through the -env 
file is bothersome. This should apply as well to the CMS settings. Perhaps it 
should even be a soft setting, as long as the possible values are marshaled 
against any injection.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933505#comment-14933505
 ] 

Jonathan Shook commented on CASSANDRA-10403:


Can we get some G1 tests with a 24+G heap to see if it's worth making this 
machine-specific? The notion of "commodity" changes with time. The settings 
need to adapt if possible.



> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10280) Make DTCS work well with old data

2015-09-21 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901028#comment-14901028
 ] 

Jonathan Shook commented on CASSANDRA-10280:


I've read the patch and the comments. Deprecating max_sstable_age_days in favor 
of the max window size is a good simplification. It also does what I originally 
had hoped max_sstable_age_days would do. So +1 on all of that.
Just to make sure, can we identify whether or not this might affect tombstone 
compaction scheduling? As in, could it cause tombstone compactions that would 
otherwise happen to not?

> Make DTCS work well with old data
> -
>
> Key: CASSANDRA-10280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10280
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> Operational tasks become incredibly expensive if you keep around a long 
> timespan of data with DTCS - with default settings and 1 year of data, the 
> oldest window covers about 180 days. Bootstrapping a node with vnodes with 
> this data layout will force cassandra to compact very many sstables in this 
> window.
> We should probably put a cap on how big the biggest windows can get. We could 
> probably default this to something sane based on max_sstable_age (ie, say we 
> can reasonably handle 1000 sstables per node, then we can calculate how big 
> the windows should be to allow that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877316#comment-14877316
 ] 

Jonathan Shook commented on CASSANDRA-7486:
---

This seems pretty open-and-shut where I would expect a bit more of a nuanced 
test. We've honestly seen G1 be the operative improvement in some cases in the 
field. I'd much prefer to see "needs more analysis" than to see it resolved as 
fixed. CMS will *not* scale with hardware as we go forward. This is not in 
debate.


> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Albert P Tobey
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877411#comment-14877411
 ] 

Jonathan Shook edited comment on CASSANDRA-7486 at 9/20/15 5:15 AM:


I do believe that there is a gap between the maximum effective CMS heap sizes 
and the minimum effective G1 sizes. I'd estimate it to be about the 14GB - 24GB 
range. Neither does admirably when taxed for GC throughput in that range. Put 
in another way, I've never and would never advocate that someone use G1 with 
less than 24G of heap. In practice, I use it only on systems with 64GB of 
memory, where it is no big deal to give G1 32GB to work with. We have simply 
seen G1 go slower when it doesn't have adequate scratch space. In essence, it 
really likes to have more memory.

We have also seen anecdotal evidence that G1 seems to settle in, performance 
wise, after a warm-up time. It could be that it needs to collect metrics long 
enough under steady state before it learns how to handle GC and heap allocation 
better. This hasn't been proven out definitively, but is strongly evidenced in 
some longer-run workload studies.

I do agree that when you don't really need more than 12GB of heap, CMS will be 
difficult to beat with the appropriate tunings. I'm not really sure what to do 
about the mid-band where neither CMS nor G1 are very happy. We may have to be 
prescriptive in the sense that if you want to use G1, then you should give it 
enough to work with effectively.

Perhaps we need to make the startup scripts source a different GC config file 
depending on the detected memory in the system. I normally configure G1 as a 
sourced (included) file to the -env.sh script, so this would be fairly 
straightforward.

[~ato...@datastax.com], any comments on this?
 


was (Author: jshook):
I do believe that there is a gap between the maximum effective CMS heap sizes 
and the minimum effective G1 sizes. I'd estimate it to be about the 14GB - 24GB 
range. Neither does admirably when taxed for GC throughput in that range. Put 
in another way, I've never and would never advocate that someone use G1 with 
less than 24G of heap. In practice, I use it only on systems with 64GB of 
memory, where it is no big deal to give G1 32GB to work with. We have simply 
seen G1 go slower when it doesn't have adequate scratch space. In essence, it 
really likes to have more memory.

We have also seen anecdotal evidence that G1 seems to settle in, performance 
wise, after a warm-up time. It could be that it needs to collect metrics long 
enough under steady state before it learns how to handle GC and heap allocation 
better. This hasn't been provided definitively, but is strongly evidenced in 
some longer-run workload studies.

I do agree that when you don't really need more than 12GB of heap, CMS will be 
difficult to beat with the appropriate tunings. I'm not really sure what to do 
about the mid-band where neither CMS nor G1 are very happy. We may have to be 
prescriptive in the sense that if you want to use G1, then you should give it 
enough to work with effectively.

Perhaps we need to make the startup scripts source a different GC config file 
depending on the detected memory in the system. I normally configure G1 as a 
sourced (included) file to the -env.sh script, so this would be fairly 
straightforward.

[~ato...@datastax.com], any comments on this?
 

> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Benedict
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877432#comment-14877432
 ] 

Jonathan Shook commented on CASSANDRA-7486:
---

I'd argue that there already is an increase in pain as you try to use more of 
the metal on a node. We've just become acclimated to it. Instead of scaling the 
compute side over the metal, we do silly things like run multiple instances per 
box. Its not really silly if it gets results, but it is an example of where we 
do something tactically, get so used to it as a necessary complexity, and then 
just keep taking for granted that this is how we do it. I personally don't want 
to keep going down this path. So, I am inclined to carry on with the testing 
and characterization, in time. We should compare notes and methods and see what 
can be done to reduce the overall effort.


> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Benedict
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877411#comment-14877411
 ] 

Jonathan Shook commented on CASSANDRA-7486:
---

I do believe that there is a gap between the maximum effective CMS heap sizes 
and the minimum effective G1 sizes. I'd estimate it to be about the 14GB - 24GB 
range. Neither does admirably when taxed for GC throughput in that range. Put 
in another way, I've never and would never advocate that someone use G1 with 
less than 24G of heap. In practice, I use it only on systems with 64GB of 
memory, where it is no big deal to give G1 32GB to work with. We have simply 
seen G1 go slower when it doesn't have adequate scratch space. In essence, it 
really likes to have more memory.

We have also seen anecdotal evidence that G1 seems to settle in, performance 
wise, after a warm-up time. It could be that it needs to collect metrics long 
enough under steady state before it learns how to handle GC and heap allocation 
better. This hasn't been provided definitively, but is strongly evidenced in 
some longer-run workload studies.

I do agree that when you don't really need more than 12GB of heap, CMS will be 
difficult to beat with the appropriate tunings. I'm not really sure what to do 
about the mid-band where neither CMS nor G1 are very happy. We may have to be 
prescriptive in the sense that if you want to use G1, then you should give it 
enough to work with effectively.

Perhaps we need to make the startup scripts source a different GC config file 
depending on the detected memory in the system. I normally configure G1 as a 
sourced (included) file to the -env.sh script, so this would be fairly 
straightforward.

[~ato...@datastax.com], any comments on this?
 

> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Albert P Tobey
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877316#comment-14877316
 ] 

Jonathan Shook edited comment on CASSANDRA-7486 at 9/19/15 8:44 PM:


This seems pretty open-and-shut where I would expect a bit more of a nuanced 
test. We've honestly seen G1 be the operative improvement in some cases in the 
field. I'd much prefer to see "needs more analysis" than to see it resolved as 
fixed. CMS will *not* scale with hardware as we go forward. This is not in 
debate.

An, nevermind. I see that is what the status is now.



was (Author: jshook):
This seems pretty open-and-shut where I would expect a bit more of a nuanced 
test. We've honestly seen G1 be the operative improvement in some cases in the 
field. I'd much prefer to see "needs more analysis" than to see it resolved as 
fixed. CMS will *not* scale with hardware as we go forward. This is not in 
debate.


> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Albert P Tobey
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-19 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877316#comment-14877316
 ] 

Jonathan Shook edited comment on CASSANDRA-7486 at 9/20/15 3:33 AM:


This seems pretty open-and-shut where I would expect a bit more of a nuanced 
test. We've honestly seen G1 be the operative improvement in some cases in the 
field. I'd much prefer to see "needs more analysis" than to see it resolved as 
fixed. CMS will *not* scale with hardware as we go forward. This is not in 
debate.

Ah, nevermind. I see that is what the status is now.



was (Author: jshook):
This seems pretty open-and-shut where I would expect a bit more of a nuanced 
test. We've honestly seen G1 be the operative improvement in some cases in the 
field. I'd much prefer to see "needs more analysis" than to see it resolved as 
fixed. CMS will *not* scale with hardware as we go forward. This is not in 
debate.

An, nevermind. I see that is what the status is now.


> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Albert P Tobey
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10297) Low-effort configuration of metrics reporters via JMX/nodetool

2015-09-09 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-10297:
--

 Summary: Low-effort configuration of metrics reporters via 
JMX/nodetool
 Key: CASSANDRA-10297
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10297
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jonathan Shook
 Fix For: 3.x


Provide the ability to configure metrics reporters via JMX, with default 
support for common reporters out of the box, including graphite.

Configuration commands should allow for full programmatic configuration of 
reporters, including managing active reporters and their filtering settings.

The prefix value should be configurable with support for several common tokens 
which will be interpolated when the prefix value is set: hostname, rpc_ipaddr, 
cluster_name, etc.

Optionally, the configuration should be backed by a configuration file which is 
automatically loaded at startup if it exists, but with no errors if it doesn't. 

JMX options added here should also be supported with nodetool.

The purpose of this improvement is to allow for bulk (re)configuration of 
metrics collection in larger deployments.

An possible approach that would be easier to implement would be to provide the 
yaml reporter configuration via a JMX method parameter, with an optional 
boolean which would signal the method to persist the file in a pre-defined 
'reporter config' location.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x

2015-09-03 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729510#comment-14729510
 ] 

Jonathan Shook commented on CASSANDRA-10249:


+1 on configurable

> Reduce over-read for standard disk io by 16x
> 
>
> Key: CASSANDRA-10249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10249
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Albert P Tobey
> Fix For: 2.1.x
>
> Attachments: patched-2.1.9-dstat-lvn10.png, 
> stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png
>
>
> On read workloads, Cassandra 2.1 reads drastically more data than it emits 
> over the network. This causes problems throughput the system by wasting disk 
> IO and causing unnecessary GC.
> I have reproduce the issue on clusters and locally with a single instance. 
> The only requirement to reproduce the issue is enough data to blow through 
> the page cache. The default schema and data size with cassandra-stress is 
> sufficient for exposing the issue.
> With stock 2.1.9 I regularly observed anywhere from 300:1  to 500 
> disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was 
> doing 300-500MB/s of disk reads, saturating the drive.
> After applying this patch for standard IO mode 
> https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 
> 100:1 on my local test rig. Latency improved considerably and GC became a lot 
> less frequent.
> I tested with 512 byte reads as well, but got the same performance, which 
> makes sense since all HDD and SSD made in the last few years have a 4K block 
> size (many of them lie and say 512).
> I'm re-running the numbers now and will post them tomorrow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x

2015-09-02 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727703#comment-14727703
 ] 

Jonathan Shook commented on CASSANDRA-10249:


I'm not so sure that this is a niche. Compression is not a default win, and I'd 
prefer that it be "unset" and require users to pick "compressed" or 
"uncompressed" in the DDL. But we don't do that. So, compressed is a default. 
Still, uncompressed is not quite a niche.

I'm less sure about the buffered IO angle. If these are reasonable options for 
some scenarios, then I don't feel quite right calling them niche. One persons 
niche is another's standard.

For those that need these settings to get the most of their current hardware, 
the large minimum read size is, in fact, a deoptimization from normal.

> Reduce over-read for standard disk io by 16x
> 
>
> Key: CASSANDRA-10249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10249
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Albert P Tobey
> Fix For: 2.1.x
>
> Attachments: patched-2.1.9-dstat-lvn10.png, 
> stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png
>
>
> On read workloads, Cassandra 2.1 reads drastically more data than it emits 
> over the network. This causes problems throughput the system by wasting disk 
> IO and causing unnecessary GC.
> I have reproduce the issue on clusters and locally with a single instance. 
> The only requirement to reproduce the issue is enough data to blow through 
> the page cache. The default schema and data size with cassandra-stress is 
> sufficient for exposing the issue.
> With stock 2.1.9 I regularly observed anywhere from 300:1  to 500 
> disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was 
> doing 300-500MB/s of disk reads, saturating the drive.
> After applying this patch for standard IO mode 
> https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 
> 100:1 on my local test rig. Latency improved considerably and GC became a lot 
> less frequent.
> I tested with 512 byte reads as well, but got the same performance, which 
> makes sense since all HDD and SSD made in the last few years have a 4K block 
> size (many of them lie and say 512).
> I'm re-running the numbers now and will post them tomorrow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10013) Default commitlog_total_space_in_mb to 4G

2015-08-25 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712420#comment-14712420
 ] 

Jonathan Shook commented on CASSANDRA-10013:


+1
Commit log space is not in short supply. I think it would be ok to make it 
larger even, but don't have any recent results to support that idea. At least 
setting it to 4G is an improvement.

 Default commitlog_total_space_in_mb to 4G
 -

 Key: CASSANDRA-10013
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10013
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Reporter: Brandon Williams
 Fix For: 2.1.x


 First, it bothers me that we default to 1G but have 4G commented out in the 
 config.
 More importantly though is more than once I've seen this lead to dropped 
 mutations, because you have ~100 tables (which isn't that hard to do with 
 OpsCenter and CFS and an application that uses a moderately high but still 
 reasonable amount of tables itself) and when the limit is reached CLA flushes 
 the oldest tables to try to free up CL space, but this in turn causes a flush 
 stampede that in some cases never ends and backs up the flush queue which 
 then causes the drops.  This leaves you thinking you have a load shedding 
 situation (which I guess you kind of do) but it would go away if you had just 
 uncommented that config line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9264) Cassandra should not persist files without checksums

2015-08-21 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707820#comment-14707820
 ] 

Jonathan Shook commented on CASSANDRA-9264:
---

[~aweisberg]
Am I correct in assuming that you agree with the need for the checkums, but 
simply want a method that is simple to reason about as well as more inline with 
the related data? Is there an opportunity here to consider this topic for 
inclusion in the table format tickets?


 Cassandra should not persist files without checksums
 

 Key: CASSANDRA-9264
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9264
 Project: Cassandra
  Issue Type: Wish
Reporter: Ariel Weisberg
 Fix For: 3.x


 Even if checksums aren't validated on the read side every time it is helpful 
 to have them persisted with checksums so that if a corrupted file is 
 encountered you can at least validate that the issue is corruption and not an 
 application level error that generated a corrupt file.
 We should standardize on conventions for how to checksum a file and which 
 checksums to use so we can ensure we get the best performance possible.
 For a small checksum I think we should use CRC32 because the hardware support 
 appears quite good.
 For cases where a 4-byte checksum is not enough I think we can look at either 
 xxhash64 or MurmurHash3.
 The problem with xxhash64 is that output is only 8-bytes. The problem with 
 MurmurHash3 is that the Java implementation is slow. If we can live with 
 8-bytes and make it easy to switch hash implementations I think xxhash64 is a 
 good choice because we already ship a good implementation with LZ4.
 I would also like to see hashes always prefixed by a type so that we can swap 
 hashes without running into pain trying to figure out what hash 
 implementation is present. I would also like to avoid making assumptions 
 about the number of bytes in a hash field where possible keeping in mind 
 compatibility and space issues.
 Hashing after compression is also desirable over hashing before compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9264) Cassandra should not persist files without checksums

2015-08-21 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706942#comment-14706942
 ] 

Jonathan Shook commented on CASSANDRA-9264:
---

This came up in discussion with a customer today. There is effectively a 
difference in read response handling between data from compressed sstables vs 
non-compressed sstables. This is due to the fact that the block checksums on 
compressed sstables can disqualify corrupted data. Non-compressed sstables have 
no equivalent checksum mechanism, so are susceptible to passing hardware-level 
corruption up without detection. Sectors that have been corrupted may cause an 
sstable to be unreadable, but it may also manifest as an undetected change in 
data.





 Cassandra should not persist files without checksums
 

 Key: CASSANDRA-9264
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9264
 Project: Cassandra
  Issue Type: Wish
Reporter: Ariel Weisberg
 Fix For: 3.x


 Even if checksums aren't validated on the read side every time it is helpful 
 to have them persisted with checksums so that if a corrupted file is 
 encountered you can at least validate that the issue is corruption and not an 
 application level error that generated a corrupt file.
 We should standardize on conventions for how to checksum a file and which 
 checksums to use so we can ensure we get the best performance possible.
 For a small checksum I think we should use CRC32 because the hardware support 
 appears quite good.
 For cases where a 4-byte checksum is not enough I think we can look at either 
 xxhash64 or MurmurHash3.
 The problem with xxhash64 is that output is only 8-bytes. The problem with 
 MurmurHash3 is that the Java implementation is slow. If we can live with 
 8-bytes and make it easy to switch hash implementations I think xxhash64 is a 
 good choice because we already ship a good implementation with LZ4.
 I would also like to see hashes always prefixed by a type so that we can swap 
 hashes without running into pain trying to figure out what hash 
 implementation is present. I would also like to avoid making assumptions 
 about the number of bytes in a hash field where possible keeping in mind 
 compatibility and space issues.
 Hashing after compression is also desirable over hashing before compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632018#comment-14632018
 ] 

Jonathan Shook commented on CASSANDRA-6477:
---

The comment about adding a hop was with respect to what users would currently 
be doing to maintain multiple views of data. They don't expect that there is a 
proxy proxy for their writes, no matter whether they are using async or not, 
batches or not.

 Materialized Views (was: Global Indexes)
 

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Carl Yeksigian
  Labels: cql
 Fix For: 3.0 beta 1

 Attachments: test-view-data.sh, users.yaml


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631983#comment-14631983
 ] 

Jonathan Shook commented on CASSANDRA-6477:
---

If we look at this from the perspective of a typical developer who simply wants 
query tables to be easier to manage, then the basic requirements are pretty 
simple: Emulate current practice. That isn't to say that we shouldn't dig 
deeper in terms of what would could make sense in different contexts, but the 
basic usage pattern that it is meant to simplify is pretty basic:

* Logged batches are not commonly used to wrap a primary table with it's query 
tables during writes. The failure modes of these are usually well understood, 
meaning that it is clear what the implications are for a failed write in nearly 
every case.
* The same CL is generally used for all related tables.
* Savvy users will do this with async with the same CL for all of these 
operations.

So effectively, I would expect the very basic form of this feature to look much 
like it would in practice already, except that it requires much less effort on 
the end user to maintain. I would like for us to consider that where the 
implementation varies from this, that there may be lots of potential for 
surprise. I really think we need to be following the principle of least 
surprise here as a start. It is almost certain that MV will be adopted quickly 
in places that have a need for it because the are essentially doing this 
manually at the present. If you require them to micro-manage the settings in 
order to even get close to the current result (performance, availability 
assumptions, ...) then we should change the defaults.

It doesn't really matter to me that we force the coordinator node to be a 
replica. This is orthogonal to the base problem, and has controls in topology 
aware clients already. As well, it does add potentially another hop, which I do 
have concerns about with respect to the above.


 Materialized Views (was: Global Indexes)
 

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Carl Yeksigian
  Labels: cql
 Fix For: 3.0 beta 1

 Attachments: test-view-data.sh, users.yaml


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631983#comment-14631983
 ] 

Jonathan Shook edited comment on CASSANDRA-6477 at 7/17/15 9:55 PM:


If we look at this from the perspective of a typical developer who simply wants 
query tables to be easier to manage, then the basic requirements are pretty 
simple: Emulate current practice. That isn't to say that we shouldn't dig 
deeper in terms of what would could make sense in different contexts, but the 
basic usage pattern that it is meant to simplify is pretty basic:

* Logged batches are not commonly used to wrap a primary table with it's query 
tables during writes. The failure modes of these are usually well understood, 
meaning that it is clear what the implications are for a failed write in nearly 
every case.
* The same CL is generally used for all related tables.
* Savvy users will do this with async with the same CL for all of these 
operations.

So effectively, I would expect the very basic form of this feature to look much 
like it would in practice already, except that it requires much less effort on 
the end user to maintain. I would like for us to consider that where the 
implementation varies from this, that there may be lots of potential for 
surprise. I really think we need to be following the principle of least 
surprise here as a start. It is almost certain that MV will be adopted quickly 
in places that have a need for it because the are essentially doing this 
manually at the present. If you require them to micro-manage the settings in 
order to even get close to the current result (performance, availability 
assumptions, ...) then we should change the defaults.

It doesn't really seem necessary that we force the coordinator node to be a 
replica. This is orthogonal to the base problem, and has controls in topology 
aware clients already. As well, it does add potentially another hop, which I do 
have concerns about with respect to the above.



was (Author: jshook):
If we look at this from the perspective of a typical developer who simply wants 
query tables to be easier to manage, then the basic requirements are pretty 
simple: Emulate current practice. That isn't to say that we shouldn't dig 
deeper in terms of what would could make sense in different contexts, but the 
basic usage pattern that it is meant to simplify is pretty basic:

* Logged batches are not commonly used to wrap a primary table with it's query 
tables during writes. The failure modes of these are usually well understood, 
meaning that it is clear what the implications are for a failed write in nearly 
every case.
* The same CL is generally used for all related tables.
* Savvy users will do this with async with the same CL for all of these 
operations.

So effectively, I would expect the very basic form of this feature to look much 
like it would in practice already, except that it requires much less effort on 
the end user to maintain. I would like for us to consider that where the 
implementation varies from this, that there may be lots of potential for 
surprise. I really think we need to be following the principle of least 
surprise here as a start. It is almost certain that MV will be adopted quickly 
in places that have a need for it because the are essentially doing this 
manually at the present. If you require them to micro-manage the settings in 
order to even get close to the current result (performance, availability 
assumptions, ...) then we should change the defaults.

It doesn't really matter to me that we force the coordinator node to be a 
replica. This is orthogonal to the base problem, and has controls in topology 
aware clients already. As well, it does add potentially another hop, which I do 
have concerns about with respect to the above.


 Materialized Views (was: Global Indexes)
 

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Carl Yeksigian
  Labels: cql
 Fix For: 3.0 beta 1

 Attachments: test-view-data.sh, users.yaml


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631983#comment-14631983
 ] 

Jonathan Shook edited comment on CASSANDRA-6477 at 7/17/15 9:58 PM:


If we look at this from the perspective of a typical developer who simply wants 
query tables to be easier to manage, then the basic requirements are pretty 
simple: Emulate current practice. That isn't to say that we shouldn't dig 
deeper in terms of what would could make sense in different contexts, but the 
basic usage pattern that it is meant to simplify is pretty basic:

* Logged batches are not commonly used to wrap a primary table with it's query 
tables during writes. The failure modes of these are usually well understood, 
meaning that it is clear what the implications are for a failed write in nearly 
every case.
* The same CL is generally used for all related tables.
* Savvy users will do this with async with the same CL for all of these 
operations.

So effectively, I would expect the very basic form of this feature to look much 
like it would in practice already, except that it requires much less effort on 
the end user to maintain. I would like for us to consider that where the 
implementation varies from this, that there may be lots of potential for 
surprise. I really think we need to be following the principle of least 
surprise here as a start. It is almost certain that MV will be adopted quickly 
in places that have a need for it because they are essentially doing this 
manually at the present. If you require them to micro-manage the settings in 
order to even get close to the current result (performance, availability 
assumptions, ...) then we should change the defaults.

It doesn't really seem necessary that we force the coordinator node to be a 
replica. This is orthogonal to the base problem, and has controls in topology 
aware clients already. As well, it does add potentially another hop, which I do 
have concerns about with respect to the above.



was (Author: jshook):
If we look at this from the perspective of a typical developer who simply wants 
query tables to be easier to manage, then the basic requirements are pretty 
simple: Emulate current practice. That isn't to say that we shouldn't dig 
deeper in terms of what would could make sense in different contexts, but the 
basic usage pattern that it is meant to simplify is pretty basic:

* Logged batches are not commonly used to wrap a primary table with it's query 
tables during writes. The failure modes of these are usually well understood, 
meaning that it is clear what the implications are for a failed write in nearly 
every case.
* The same CL is generally used for all related tables.
* Savvy users will do this with async with the same CL for all of these 
operations.

So effectively, I would expect the very basic form of this feature to look much 
like it would in practice already, except that it requires much less effort on 
the end user to maintain. I would like for us to consider that where the 
implementation varies from this, that there may be lots of potential for 
surprise. I really think we need to be following the principle of least 
surprise here as a start. It is almost certain that MV will be adopted quickly 
in places that have a need for it because the are essentially doing this 
manually at the present. If you require them to micro-manage the settings in 
order to even get close to the current result (performance, availability 
assumptions, ...) then we should change the defaults.

It doesn't really seem necessary that we force the coordinator node to be a 
replica. This is orthogonal to the base problem, and has controls in topology 
aware clients already. As well, it does add potentially another hop, which I do 
have concerns about with respect to the above.


 Materialized Views (was: Global Indexes)
 

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Carl Yeksigian
  Labels: cql
 Fix For: 3.0 beta 1

 Attachments: test-view-data.sh, users.yaml


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

2015-07-17 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632015#comment-14632015
 ] 

Jonathan Shook commented on CASSANDRA-6477:
---

This goes directly to my point. It would be ideal if we simply allow users to 
simplify what they are already doing with the least amount of special 
handling we can add to the mix. In terms of solving the problem in a way that 
users understand, we must strive to compose a solution from the already 
established primitives that we teach users about all the time. Any failure 
modes should be explained in those terms as well. Other approach are likely to 
create more special cases, which I think we all can agree are not good for 
anybody.


 Materialized Views (was: Global Indexes)
 

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Carl Yeksigian
  Labels: cql
 Fix For: 3.0 beta 1

 Attachments: test-view-data.sh, users.yaml


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9130) reduct default dtcs max_sstable_age

2015-07-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615546#comment-14615546
 ] 

Jonathan Shook commented on CASSANDRA-9130:
---

I'm not particularly concerned about the corner cases for lots of sstables, but 
it does need to be documented better. We do not yet have tools to manage 
re-compacting DTCS past max_sstable_age_days. Even if we did, it would not be 
an automatic win in every case. The operational trade-offs that come with 
different max_sstable_age_days are simply too stark to avoid. I still believe 
that 365 is way too high. Studying the total bytes compacted over different 
DTCS settings and ingest rates can show the IO load. 365 is way beyond the 
point at which you start paying for more compaction than you need in most 
systems.

I do agree, though about the boundary condition. We should have a safety in 
place to avoid max_sstable_age_days  table TTL until we can verify that a 
TTL-specific compaction pass will occur as needed.
This might be a concern as well for per-write TTLs.

[~jjirsa]
Is there a way that you would like to see the interplay between TTLs and 
max_sstable_age_days handled? Is there a solution which you would consider safe?




 reduct default dtcs max_sstable_age
 ---

 Key: CASSANDRA-9130
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9130
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Marcus Eriksson
Priority: Minor
 Fix For: 3.x, 2.1.x, 2.0.x


 Now that CASSANDRA-9056 is fixed it should be safe to reduce the default age 
 and increase performance correspondingly.  [~jshook] suggests that two weeks 
 may be appropriate, or we could make it dynamic based on gcgs (since that's 
 the window past which we should expect repair to not introduce fragmentation 
 anymore).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9378) Instrument the logger with error count metrics by level

2015-05-13 Thread Jonathan Shook (JIRA)
Jonathan Shook created CASSANDRA-9378:
-

 Summary: Instrument the logger with error count metrics by level
 Key: CASSANDRA-9378
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9378
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Shook
Priority: Minor


The ability to do sanity checks against logged errors and warning counts could 
be helpful for several reasons. One of the most obvious would be as a way to 
verify that no errors were logged during a (semi-) automated upgrade or restart 
process.

Fortunately, this is easy to enable as described here: 
https://dropwizard.github.io/metrics/3.1.0/manual/logback/

It was pointed out by [~jjordan] that this ability should exist in current 
version if the user is willing to drop in the right jars and modify the 
appender config.

It would also be helpful as a programmatic feature with a toggle to enable or 
disable, possibly with a cassandra.yaml config parameter. There may be some 
users who would prefer to disable it to avoid calling another appender. If 
testing shows the overhead for this to be sufficiently low, we could just leave 
it on by default.

These should be exposed via JMX when they are enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8303) Create a capability limitation framework

2015-05-11 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538244#comment-14538244
 ] 

Jonathan Shook commented on CASSANDRA-8303:
---

I am concerned that we are creating more complexity through pedantry here. Let 
me explain...
I know this is late in coming, but I'm going to explain my position anyway.

The idea that a user would be authorized for an operation does not depend on it 
being on specific data. There is no hard and fast rule that says you must not 
use authorization to control access to types of actions. From the general 
perspective, that is what authorization is about. It is simply a mechanism to 
answer the question Is the current user allowed to do ? This is not 
strictly limited to accessing specific data, but may also be used to limit 
access to specific types of actions which are not data-specific. To assert that 
it is is to ignore lots of established practice across a great number of 
systems. Authorization is a general concept which needs to be applied in a 
conceptually useful and idiomatic way for each system.

There is a variety of approaches in the wild for how to structure permissions. 
Some of them assume no access with exclusions in the form of grants. Some do 
the opposite. My favorite is the 3-state logic used by the postfix MTA (thank 
you, Venema!) which allows for a very plugable system. In this system, a chain 
of evaluators is used, and each may grant, deny, or indicate I don't know, ask 
the next one.. And all you have to do to establish a default is put a static 
grant or deny at the end of the evaluator chain. So, this is a bit 
non-sequitor, but the point is to illustrate the variety and flexibility of 
authorization systems out there. A competing concern for the flexibility of 
these system is always how easy they are to understand and use. That feeds 
directly into my next point.

I don't understand why we would want to create two semantically distinct 
interfaces for users when we are really talking about a basic authentication 
problem. Group or individual, data-oriented or command-oriented. Once you've 
made the user pay the price of entry to use the authentication system, you're 
going to tell them that they have to learn a different system to do capability 
limiting because we decided to name it and treat it differently? I think this 
is a case of accidental complexity in the name of separation of concerns, when 
in fact they are not really separate concerns.
 
You can't completely separate the mechanisms of authorization, group 
membership, and capabilities. They may have cleanly defined APIs, but if you 
look at the implementation details of CASSANDRA-7653, saying that they are 
logically separate would be a half-truth at best. Indeed, the authentication 
data has been pulled up to be visible and owned by the group management logic. 
You simply can't have group authentication without authentication. How they are 
mapped together is another matter. I would assert that authenticating 
individuals and mapping them via groups would probably be cleaner, but the 
mechanisms would still need to be inextricably linked at some layer.

From a security perspective, limiting commands is every bit a part of managing 
system availability in a security and continuation sense as other forms of 
authentication+authorization. Think about DOS attacks and what it means to 
prevent them via command restrictions. It's just that some of the commands are 
data-specific, and some are not. You simply can not have a proper and separate 
subsystem for limiting commands without mostly reinventing the wheel around 
authentication and authorization.

So, regardless of how it's implemented, I don't think we should be trying to 
designate authorization and limiting allowed commands as different concerns 
from the user's perspective. If you subscribe to that mindset, any notion that 
they would be implemented as rigidly isolated subsystems would be a reason for 
concern. I'm all for keeping the implemention clean and composable. I just 
don't want to see us shoot ourselves in the foot because we are forcing 
separation when there is conceptual, logical, and mechanical affinity.

I uncomfortable with the suggestion under What does this buy us?, #5 that 
this would simply be vetoed if it weren't done the way suggested above. Was 
that meant to be a qualification of how this ticket can move forward?


 Create a capability limitation framework
 

 Key: CASSANDRA-8303
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8303
 Project: Cassandra
  Issue Type: Improvement
Reporter: Anupam Arora
Assignee: Sam Tunnicliffe
 Fix For: 3.x


 In addition to our current Auth framework that acts as a white list, and 
 regulates access to data, 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:32 AM:
-

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be load 
shifting even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonably at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:26 AM:
-

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be load 
shifting even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonable at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the 

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook commented on CASSANDRA-9318:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided them as examples of types of behaviors 
which are nuanced yet perfectly valid for different types of system designers. 
The real point here is that there is not a single overall design which is going 
to be acceptable to all users. Still, we need to ensure stability under 
saturating load where possible. I would like to think that with CASSANDRA-8099 
that we can start discussing some of the client-facing back-pressure ideas more 
earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.

 Bound the number of in-flight requests at the coordinator
 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:42 PM:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided them as examples of types of behaviors 
which are nuanced yet perfectly valid for different types of system designers. 
The real point here is that there is not a single overall design which is going 
to be acceptable to all users. Still, we need to ensure stability under 
saturating load where possible. I would like to think that with CASSANDRA-8099 
that we can start discussing some of the client-facing back-pressure ideas more 
earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.


was (Author: jshook):
I would venture that a 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:46 PM:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided the examples above as types of behaviors 
which are nuanced yet perfectly valid for different types of system designs. 
The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.


was (Author: 

[jira] [Commented] (CASSANDRA-9234) Disable single-sstable tombstone compactions for DTCS

2015-04-27 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515852#comment-14515852
 ] 

Jonathan Shook commented on CASSANDRA-9234:
---

+1

 Disable single-sstable tombstone compactions for DTCS
 -

 Key: CASSANDRA-9234
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9234
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.0.15

 Attachments: 0001-9234.patch


 We should probably disable tombstone compactions by default for DTCS for 
 these reasons:
 # users should not do deletes with DTCS
 # the only way we should get rid of data is by TTL - and then we don't want 
 to trigger a single sstable compaction whenever an sstable is 20%+ expired, 
 we want to drop the whole thing when it is fully expired



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8929) Workload sampling

2015-04-15 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496850#comment-14496850
 ] 

Jonathan Shook commented on CASSANDRA-8929:
---

There was an interesting discussion on this today. Notably, user effort to take 
a workload sample and create a stress tool or profile from it could be very 
low. It should be possible to take a sample of workload from a development 
system as the basis for a fully-configured stress test.

It would also be less theoretical than any of the other approaches we are 
currently discussing, so would have a relatively limited scope of 
implementation.

 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8826) Distributed aggregates

2015-04-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486692#comment-14486692
 ] 

Jonathan Shook edited comment on CASSANDRA-8826 at 4/9/15 4:56 AM:
---

Consider that many systems are implementing aggregate processing at the client 
node. A more optimal system would allow those aggregates to be processed close 
to storage rather than bulk shipping operands across the wire to the client 
before any computation can even be started. Even using the coordinate for this 
is relatively wasteful. After considering multiple options for how to handle 
aggregates in a Cassandra-idiomatic way, I arrived at pretty much the same 
place as [~benedict]. The point is not to try to emulate other systems, but to 
highly optimize a very common and traffic-sensitive usage pattern.

The partial data scenarios (CL1) are interesting, but you can easily describe 
what a reasonable behavior would be if data were missing from a replica. In the 
most basic case, you simply reflect the standard CL interpretation that the 
results from these nodes is not consistent at CL=Q. While this is not helpful 
to clients as such, it is a consistent interpretation of the semantics. The 
same types of things you might do as a user to deal with it do not change. If 
the data of interest is consistent, then aggregations of that data will be 
consistent, and vice-versa.

That almost certainly invites more questions about the likely scenario of 
partial data for near-time reads at CL1. That, to me, is the most interesting 
and challenging part of this idea. If you simply do active read repair logic as 
an intermediate step (when needed), you still maintain the same CL semantics 
that users would expect.

Am I missing something that makes this more complicated than I am thinking? My 
impression is that the concern for complexity is more fairly placed on the more 
advanced things that you might build on top of distributed single partition 
aggregates, not the basic idea of it.



was (Author: jshook):
Consider that many systems are implementing aggregate processing at the client 
node. A more optimal system would allow those aggregates to be processed close 
to storage rather than bulk shipping operands across the wire to the client 
before any computation can even be started. Even using the coordinate for this 
is relatively wasteful. After considering multiple options for how to handle 
aggregates in a Cassandra-idiomatic way, I arrived at pretty much the same 
place as [~benedict]. The point is not to try to emulate other systems, but to 
highly optimize a very common and traffic-sensitive usage pattern.

The partial data scenarios (CL1) are interesting, but you can easily describe 
what a reasonable behavior would be if data were missing from a replica. In the 
most basic case, you simply reflect the standard CL interpretation that the 
results from these nodes is not consistent at CL=Q. While this is not helpful 
to clients as such, it is a consistent interpretation of the semantics. The 
same types of things you might do as a user to deal with it do not change. If 
the data of interest is consistent, then aggregations of that data will be 
consistent, and vice-versa.

That almost certainly invites more questions about the likely scenario of 
partial data for near-time reads at CL1. That, to me, is the most interesting 
and challenging part of this idea. If you simply active read repair logic as an 
intermediate step, you still maintain the same CL semantics that users would 
expect.

Am I missing something that makes this more complicated than I am thinking? My 
impression is that the concern for complexity is more fairly placed on the more 
advanced things that you might build on top of distributed single partition 
aggregates, not the basic idea of it.


 Distributed aggregates
 --

 Key: CASSANDRA-8826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robert Stupp
Priority: Minor

 Aggregations have been implemented in CASSANDRA-4914.
 All calculation is performed on the coordinator. This means, that all data is 
 pulled by the coordinator and processed there.
 This ticket's about to distribute aggregates to make them more efficient. 
 Currently some related tickets (esp. CASSANDRA-8099) are currently in 
 progress - we should wait for them to land before talking about 
 implementation.
 Another playgrounds (not covered by this ticket), that might be related is 
 about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

2015-04-08 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486692#comment-14486692
 ] 

Jonathan Shook commented on CASSANDRA-8826:
---

Consider that many systems are implementing aggregate processing at the client 
node. A more optimal system would allow those aggregates to be processed close 
to storage rather than bulk shipping operands across the wire to the client 
before any computation can even be started. Even using the coordinate for this 
is relatively wasteful. After considering multiple options for how to handle 
aggregates in a Cassandra-idiomatic way, I arrived at pretty much the same 
place as [~benedict]. The point is not to try to emulate other systems, but to 
highly optimize a very common and traffic-sensitive usage pattern.

The partial data scenarios (CL1) are interesting, but you can easily describe 
what a reasonable behavior would be if data were missing from a replica. In the 
most basic case, you simply reflect the standard CL interpretation that the 
results from these nodes is not consistent at CL=Q. While this is not helpful 
to clients as such, it is a consistent interpretation of the semantics. The 
same types of things you might do as a user to deal with it do not change. If 
the data of interest is consistent, then aggregations of that data will be 
consistent, and vice-versa.

That almost certainly invites more questions about the likely scenario of 
partial data for near-time reads at CL1. That, to me, is the most interesting 
and challenging part of this idea. If you simply active read repair logic as an 
intermediate step, you still maintain the same CL semantics that users would 
expect.

Am I missing something that makes this more complicated than I am thinking? My 
impression is that the concern for complexity is more fairly placed on the more 
advanced things that you might build on top of distributed single partition 
aggregates, not the basic idea of it.


 Distributed aggregates
 --

 Key: CASSANDRA-8826
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robert Stupp
Priority: Minor

 Aggregations have been implemented in CASSANDRA-4914.
 All calculation is performed on the coordinator. This means, that all data is 
 pulled by the coordinator and processed there.
 This ticket's about to distribute aggregates to make them more efficient. 
 Currently some related tickets (esp. CASSANDRA-8099) are currently in 
 progress - we should wait for them to land before talking about 
 implementation.
 Another playgrounds (not covered by this ticket), that might be related is 
 about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8359) Make DTCS consider removing SSTables much more frequently

2015-03-31 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389501#comment-14389501
 ] 

Jonathan Shook commented on CASSANDRA-8359:
---

Linking to a 9056, possibly a duplicate.

 Make DTCS consider removing SSTables much more frequently
 -

 Key: CASSANDRA-8359
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8359
 Project: Cassandra
  Issue Type: Improvement
Reporter: Björn Hegerfors
Assignee: Björn Hegerfors
Priority: Minor
 Attachments: cassandra-2.0-CASSANDRA-8359.txt


 When I run DTCS on a table where every value has a TTL (always the same TTL), 
 SSTables are completely expired, but still stay on disk for much longer than 
 they need to. I've applied CASSANDRA-8243, but it doesn't make an apparent 
 difference (probably because the subject SSTables are purged via compaction 
 anyway, if not by directly dropping them).
 Disk size graphs show clearly that tombstones are only removed when the 
 oldest SSTable participates in compaction. In the long run, size on disk 
 continually grows bigger. This should not have to happen. It should easily be 
 able to stay constant, thanks to DTCS separating the expired data from the 
 rest.
 I think checks for whether SSTables can be dropped should happen much more 
 frequently. This is something that probably only needs to be tweaked for 
 DTCS, but perhaps there's a more general place to put this. Anyway, my 
 thinking is that DTCS should, on every call to getNextBackgroundTask, check 
 which SSTables can be dropped. It would be something like a call to 
 CompactionController.getFullyExpiredSSTables with all non-compactingSSTables 
 sent in as compacting and all other SSTables sent in as overlapping. The 
 returned SSTables, if any, are then added to whichever set of SSTables that 
 DTCS decides to compact. Then before the compaction happens, Cassandra is 
 going to make another call to CompactionController.getFullyExpiredSSTables, 
 where it will see that it can just drop them.
 This approach has a bit of redundancy in that it needs to call 
 CompactionController.getFullyExpiredSSTables twice. To avoid that, the code 
 path for deciding SSTables to drop would have to be changed.
 (Side tracking a little here: I'm also thinking that tombstone compactions 
 could be considered more often in DTCS. Maybe even some kind of multi-SSTable 
 tombstone compaction involving the oldest couple of SSTables...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8986) Major cassandra-stress refactor

2015-03-18 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368394#comment-14368394
 ] 

Jonathan Shook commented on CASSANDRA-8986:
---

It is good to see the discussion move in this direction.

[~benedict], All,
Nearly all of what you describe in the list of behaviors are on my list for 
another project as well. Although it's still a fairly new project, there have 
been some early successes with demos and training tools. Here is a link that 
explains the project and motives: 
https://github.com/jshook/metagener/blob/master/metagener-core/docs/README.md
I'd be happy to talk in more detail about it. It seems like we have lots of the 
same ideas about what is needed at the foundational level.

It's possible to achieve a drastic simplification of the user-facing part, but 
only if we are willing to revamp the notion of how we define test loads.

RE: distributing test loads: I have been thinking about how to distribute 
stress across multiple clients as well. The gist of it is that we can't get 
there without having a way to automatically partition the client workload 
across some spectrum. As follow-on work, I think it can be done. First we need 
a conceptually obvious and clean way to define whole test loads such that they 
can be partitioned compatibly with the behaviors described above.

 If I can help, given the other work I've been doing, let's keep the 
conversation going.


 Major cassandra-stress refactor
 ---

 Key: CASSANDRA-8986
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8986
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Benedict
Assignee: Benedict

 We need a tool for both stressing _and_ validating more complex workloads 
 than stress currently supports. Stress needs a raft of changes, and I think 
 it would be easier to deliver many of these as a single major endeavour which 
 I think is justifiable given its audience. The rough behaviours I want stress 
 to support are:
 * Ability to know exactly how many rows it will produce, for any clustering 
 prefix, without generating those prefixes
 * Ability to generate an amount of data proportional to the amount it will 
 produce to the server (or consume from the server), rather than proportional 
 to the variation in clustering columns
 * Ability to reliably produce near identical behaviour each run
 * Ability to understand complex overlays of operation types (LWT, Delete, 
 Expiry, although perhaps not all implemented immediately, the framework for 
 supporting them easily)
 * Ability to (with minimal internal state) understand the complete cluster 
 state through overlays of multiple procedural generations
 * Ability to understand the in-flight state of in-progress operations (i.e. 
 if we're applying a delete, understand that the delete may have been applied, 
 and may not have been, for potentially multiple conflicting in flight 
 operations)
 I think the necessary changes to support this would give us the _functional_ 
 base to support all the functionality I can currently envisage stress 
 needing. Before embarking on this (which I may attempt very soon), it would 
 be helpful to get input from others as to features missing from stress that I 
 haven't covered here that we will certainly want in the future, so that they 
 can be factored in to the overall design and hopefully avoid another refactor 
 one year from now, as its complexity is scaling each time, and each time it 
 is a higher sunk cost. [~jbellis] [~iamaleksey] [~slebresne] [~tjake] 
 [~enigmacurry] [~aweisberg] [~blambov] [~jshook] ... and @everyone else :) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8929) Workload sampling

2015-03-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350956#comment-14350956
 ] 

Jonathan Shook edited comment on CASSANDRA-8929 at 3/7/15 12:50 AM:


Ideas on how I would like to see this work: (This is where I contradict myself 
in terms of simplicity by asking for more.)
Intercept at the coordinator, only record samples at the coordinator. Make 
sampling a sticky setting. Make it a table option, but also soft-settable via 
JMX.

Sampling controls:
* sample_probability: Just like trace probability
* sample_interval_seconds: Number of seconds for each sampling interval (I 
can't imagine why we'd need something finer grained, but maybe?)
* sample_max_per_interval: explained below

sample_max_per_interval: Number of samples per sampling interval, after which 
samples are suppressed. In this case, when the interval completes, the number 
of suppressed samples should also be written to the sample log, and reset. It's 
ok for this to be inconsistent with respect to restarts, etc. The main purpose 
it to avoid significant over sampling load, while still being able to see 
meaningful data during unexpected bursts.

Data controls, for anonymizing field values, when needed, the ability to select 
a level of obfuscation:
sample_data_obfuscate:
* actualfields - No changes, record samples with full field values
* hashedfields - Use md5 or something better to hide original sample values, 
but allow for statistical analysis
* fieldsizes - Discard value, but record string lengths and collection counts
* nofields - Do not retain the field values

Data coverage: What to record.
* the statement itself
* whether it was prepared or not
* consistency level
* the client address
* any changes to sampling policy or settings - This could be a separate type of 
record in the sample log, as long as the formatting is stable for each value it 
encodes
* any counts for suppressed samples (written lazily at unthrottling time)



was (Author: jshook):
Ideas on how I would like to see this work: (This is where I contradict myself 
in terms of simplicity by asking for more.)
Intercept at the coordinator, only record samples at the coordinator. Make 
sampling a sticky setting. Make it a table option, but also soft-settable via 
JMX.

Sampling controls:
* sample_probability: Just like trace probability
* sample_interval_seconds: Number of seconds for each sampling interval (I 
can't imagine why we'd need something finer grained, but maybe?)
* sample_max_per_interval: explained below

sample_max_per_interval: Number of samples per sampling interval, after which 
samples are suppressed. In this case, when the interval completes, the number 
of suppressed samples should also be written to the sample log, and reset. It's 
ok for this to be inconsistent with respect to restarts, etc. The main purpose 
it to avoid significant over sampling load, while still being able to see 
meaningful data during unexpected bursts.

Data controls, for anonymizing field values, when needed, the ability to select 
a level of obfuscation:
sample_data_obfuscate:
* actual - No changes, record samples with full field values
* hashed - Use md5 or something better to hide original sample values, but 
allow for statistical analysis
* sizes - Discard value, but record string lengths and collection counts

Data coverage: What to record.
* the statement itself
* whether it was prepared or not
* consistency level
* the client address
* any changes to sampling policy or settings - This could be a separate type of 
record in the sample log, as long as the formatting is stable for each value it 
encodes
* any counts for suppressed samples (written lazily at unthrottling time)


 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8929) Workload sampling

2015-03-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350897#comment-14350897
 ] 

Jonathan Shook commented on CASSANDRA-8929:
---

The ability to build testing tools around particular workloads is something we 
have been needing for a long time. I don't understand why the implementation 
would be complex. It is arguably much simpler than something like tracing, 
possibly even just a subset of tracing. All that has to be done is to support 
probabilistic sampling of statements, either at the coordinator or at the 
replica level. It's not complicated.

Capturing the data in sample form is just the first step. The ability to look 
at a set of captured data and build a reasonably accurate test profile is 
something that we can't yet do automatically. However, it is something that can 
be made possible by having the samples. Still, I'd consider analysis of samples 
as a separate scope, and not the thrust of this request.

Consuming sstables offline as a way to generate stress profiles is really 
avoiding the whole idea of sampling. You might be able to use CDC for that 
eventually (CASSANDRA-8844). In order to capture meaningful samples at a 
reasonable cost and level of operational simplicity means that we have to treat 
this as an operational feature worth pursuing. There are other reasons to want 
sampling besides just feeding stress. There are other testing tools which might 
make use of the data to help with full-stack testing. I can easily see someone 
wanting to use samples in an operational monitoring sense as well.


 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8929) Workload sampling

2015-03-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350956#comment-14350956
 ] 

Jonathan Shook commented on CASSANDRA-8929:
---

Ideas on how I would like to see this work: (This is where I contradict myself 
in terms of simplicity by asking for more.)
Intercept at the coordinator, only record samples at the coordinator. Make 
sampling a sticky setting. Make it a table option, but also soft-settable via 
JMX.

Sampling controls:
* sample_probability: Just like trace probability
* sample_interval_seconds: Number of seconds for each sampling interval (I 
can't imagine why we'd need something finer grained, but maybe?)
* sample_max_per_interval: explained below

sample_max_per_interval: Number of samples per sampling interval, after which 
samples are suppressed. In this case, when the interval completes, the number 
of suppressed samples should also be written to the sample log, and reset. It's 
ok for this to be inconsistent with respect to restarts, etc. The main purpose 
it to avoid significant over sampling load, while still being able to see 
meaningful data during unexpected bursts.

Data controls, for anonymizing field values, when needed, the ability to select 
a level of obfuscation:
* sample_data_obfuscate
* actual - No changes, record samples with full field values
* hashed - Use md5 or something better to hide original sample values, but 
allow for statistical analysis
* sizes - Discard value, but record string lengths and collection counts

Data coverage: What to record.
* the statement itself
* whether it was prepared or not
* consistency level
* the client address
* any changes to sampling policy or settings - This could be a separate type of 
record in the sample log, as long as the formatting is stable for each value it 
encodes
* any counts for suppressed samples (written lazily at unthrottling time)


 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8929) Workload sampling

2015-03-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350901#comment-14350901
 ] 

Jonathan Shook commented on CASSANDRA-8929:
---

Responding to [~jbellis], as we posted in parallel.

Short of having sampling support on the server side, I do not see us getting 
useful samples. In all the environments that we operate in, the most reliable 
tools we have are those that are built into Cassandra directly. This feature 
would allow us to stop reinventing the wheel with users every time we need to 
understand what their workload is with respect to POCs and forward planning. 
I've personally started leaning more and more on settraceprobability for this, 
but it comes with its own caveats. To have something that is more tailored 
around sampling *just* the statements would save lots of time and energy.

This is the type of feature that, when you need it, there is no substitute. If 
we could go into a new environment and make reasonable suggestions for how to 
configure sampling up front, we would be able to simply refer back to the data 
for historic context, changes in workload patterns, changes in data rates, etc.

The short answer is, No, I don't know of an easier way, given all the 
trade-offs.




 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8929) Workload sampling

2015-03-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350956#comment-14350956
 ] 

Jonathan Shook edited comment on CASSANDRA-8929 at 3/6/15 9:48 PM:
---

Ideas on how I would like to see this work: (This is where I contradict myself 
in terms of simplicity by asking for more.)
Intercept at the coordinator, only record samples at the coordinator. Make 
sampling a sticky setting. Make it a table option, but also soft-settable via 
JMX.

Sampling controls:
* sample_probability: Just like trace probability
* sample_interval_seconds: Number of seconds for each sampling interval (I 
can't imagine why we'd need something finer grained, but maybe?)
* sample_max_per_interval: explained below

sample_max_per_interval: Number of samples per sampling interval, after which 
samples are suppressed. In this case, when the interval completes, the number 
of suppressed samples should also be written to the sample log, and reset. It's 
ok for this to be inconsistent with respect to restarts, etc. The main purpose 
it to avoid significant over sampling load, while still being able to see 
meaningful data during unexpected bursts.

Data controls, for anonymizing field values, when needed, the ability to select 
a level of obfuscation:
sample_data_obfuscate:
* actual - No changes, record samples with full field values
* hashed - Use md5 or something better to hide original sample values, but 
allow for statistical analysis
* sizes - Discard value, but record string lengths and collection counts

Data coverage: What to record.
* the statement itself
* whether it was prepared or not
* consistency level
* the client address
* any changes to sampling policy or settings - This could be a separate type of 
record in the sample log, as long as the formatting is stable for each value it 
encodes
* any counts for suppressed samples (written lazily at unthrottling time)



was (Author: jshook):
Ideas on how I would like to see this work: (This is where I contradict myself 
in terms of simplicity by asking for more.)
Intercept at the coordinator, only record samples at the coordinator. Make 
sampling a sticky setting. Make it a table option, but also soft-settable via 
JMX.

Sampling controls:
* sample_probability: Just like trace probability
* sample_interval_seconds: Number of seconds for each sampling interval (I 
can't imagine why we'd need something finer grained, but maybe?)
* sample_max_per_interval: explained below

sample_max_per_interval: Number of samples per sampling interval, after which 
samples are suppressed. In this case, when the interval completes, the number 
of suppressed samples should also be written to the sample log, and reset. It's 
ok for this to be inconsistent with respect to restarts, etc. The main purpose 
it to avoid significant over sampling load, while still being able to see 
meaningful data during unexpected bursts.

Data controls, for anonymizing field values, when needed, the ability to select 
a level of obfuscation:
* sample_data_obfuscate
* actual - No changes, record samples with full field values
* hashed - Use md5 or something better to hide original sample values, but 
allow for statistical analysis
* sizes - Discard value, but record string lengths and collection counts

Data coverage: What to record.
* the statement itself
* whether it was prepared or not
* consistency level
* the client address
* any changes to sampling policy or settings - This could be a separate type of 
record in the sample log, as long as the formatting is stable for each value it 
encodes
* any counts for suppressed samples (written lazily at unthrottling time)


 Workload sampling
 -

 Key: CASSANDRA-8929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis

 Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
 something almost as useful by sampling the requests sent to a node and 
 building a synthetic workload with the same characteristics using the same 
 (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8869) Normalize prepared query text

2015-02-26 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339310#comment-14339310
 ] 

Jonathan Shook commented on CASSANDRA-8869:
---

We looked at this in more detail. It initially looked like a trivial change, 
but after digging in a bit, there are some potentially thorny issues how it 
might behave in practice.

For example, if you are using 75% of the allotted space for prepared statement 
caching, then it would be possible to seriously impact a production load due to 
a rolling upgrade churning the cache.

Fixing this issue might not be worth the risk or even the trouble to log a 
warning.

 Normalize prepared query text
 -

 Key: CASSANDRA-8869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8869
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Michael Penick
Priority: Trivial
  Labels: lhf

 It's possible for equivalent queries with different case and/or whitespace to 
 resolve to different prepared statement hashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8406) Add option to set max_sstable_age in seconds in DTCS

2015-02-04 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306403#comment-14306403
 ] 

Jonathan Shook commented on CASSANDRA-8406:
---

+1 on 0001-8406.patch


 Add option to set max_sstable_age in seconds in DTCS
 

 Key: CASSANDRA-8406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8406
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.0.13

 Attachments: 0001-8406.patch, 0001-patch.patch


 Using days as the unit for max_sstable_age in DTCS might be too much, add 
 option to set it in seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-01-15 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279768#comment-14279768
 ] 

Jonathan Shook commented on CASSANDRA-8621:
---

For the scenario that prompted this ticket, it appeared that the streaming 
process was completely stalled. One side of the stream (the sender side) had an 
exception that appeared to be a connection reset. The receiving side appeared 
to think that the connection was still active, at least in terms of the 
netstats reported by nodetool. We were unable to verify whether this was 
specifically the case in terms of connected sockets due to the fact that there 
were multiple streams for those peers, and there is no simple way to correlate 
a specific stream to a tcp session.

[~yukim]
If there is a diagnostic method that we can use to provide more information 
about specific stalled streams, please let us know so that we can approach the 
user to get more data.


 For streaming operations, when a socket is closed/reset, we should 
 retry/reinitiate that stream
 ---

 Key: CASSANDRA-8621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jeremy Hanna
Assignee: Yuki Morishita

 Currently we have a setting (streaming_socket_timeout_in_ms) that will 
 timeout and retry the stream operation in the case where tcp is idle for a 
 period of time.  However in the case where the socket is closed or reset, we 
 do not retry the operation.  This can happen for a number of reasons, 
 including when a firewall sends a reset message on a socket during a 
 streaming operation, such as nodetool rebuild necessarily across DCs or 
 repairs.
 Doing a retry would make the streaming operations more resilient.  It would 
 be good to log the retry clearly as well (with the stream session ID and node 
 address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

2015-01-15 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279774#comment-14279774
 ] 

Jonathan Shook commented on CASSANDRA-8621:
---

As well, there were no TCP level errors showing for the receiving side. So it 
is unclear whether exceptions are being omitted, or whether there was something 
really strange occurring with the network.

 For streaming operations, when a socket is closed/reset, we should 
 retry/reinitiate that stream
 ---

 Key: CASSANDRA-8621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jeremy Hanna
Assignee: Yuki Morishita

 Currently we have a setting (streaming_socket_timeout_in_ms) that will 
 timeout and retry the stream operation in the case where tcp is idle for a 
 period of time.  However in the case where the socket is closed or reset, we 
 do not retry the operation.  This can happen for a number of reasons, 
 including when a firewall sends a reset message on a socket during a 
 streaming operation, such as nodetool rebuild necessarily across DCs or 
 repairs.
 Doing a retry would make the streaming operations more resilient.  It would 
 be good to log the retry clearly as well (with the stream session ID and node 
 address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2015-01-07 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268134#comment-14268134
 ] 

Jonathan Shook commented on CASSANDRA-8371:
---

[~Bj0rn], [~michaelsembwever]
Is there any new data on this? Any changes to settings or observations since 
the last major update?


 DateTieredCompactionStrategy is always compacting 
 --

 Key: CASSANDRA-8371
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: mck
Assignee: Björn Hegerfors
  Labels: compaction, performance
 Attachments: java_gc_counts_rate-month.png, 
 read-latency-recommenders-adview.png, read-latency.png, 
 sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png


 Running 2.0.11 and having switched a table to 
 [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
 disk IO and gc count increase, along with the number of reads happening in 
 the compaction hump of cfhistograms.
 Data, and generally performance, looks good, but compactions are always 
 happening, and pending compactions are building up.
 The schema for this is 
 {code}CREATE TABLE search (
   loginid text,
   searchid timeuuid,
   description text,
   searchkey text,
   searchurl text,
   PRIMARY KEY ((loginid), searchid)
 );{code}
 We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
 CQL executed against this keyspace, and traffic patterns, can be seen in 
 slides 7+8 of https://prezi.com/b9-aj6p2esft/
 Attached are sstables-per-read and read-latency graphs from cfhistograms, and 
 screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), 
 to DTCS (week ~46).
 These screenshots are also found in the prezi on slides 9-11.
 [~pmcfadin], [~Bj0rn], 
 Can this be a consequence of occasional deleted rows, as is described under 
 (3) in the description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8303) Provide strict mode for CQL Queries

2015-01-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266550#comment-14266550
 ] 

Jonathan Shook commented on CASSANDRA-8303:
---

It might be nice if the auth system was always in play (when that auth provider 
is set), but the system defaults are applied to a virtual role with a name like 
defaults. This cleans up any layering questions by casting the yaml defaults 
into the authz conceptual model. If a user isn't assigned to another defined 
role, they should be automatically assigned to the defaults role.

Otherwise, explaining the result of layering them, even with precedence, might 
become overly cumbersome. With it, you can use both.



 Provide strict mode for CQL Queries
 -

 Key: CASSANDRA-8303
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8303
 Project: Cassandra
  Issue Type: Improvement
Reporter: Anupam Arora
 Fix For: 3.0


 Please provide a strict mode option in cassandra that will kick out any CQL 
 queries that are expensive, e.g. any query with ALLOWS FILTERING, 
 multi-partition queries, secondary index queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8303) Provide strict mode for CQL Queries

2014-12-30 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261885#comment-14261885
 ] 

Jonathan Shook commented on CASSANDRA-8303:
---

A permission that might be helpful to add to the list: UNPREPARED_STATEMENTS. 
I can easily see unprepared statements being disallowed in some environments, 
for prod app accounts.

 Provide strict mode for CQL Queries
 -

 Key: CASSANDRA-8303
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8303
 Project: Cassandra
  Issue Type: Improvement
Reporter: Anupam Arora
 Fix For: 3.0


 Please provide a strict mode option in cassandra that will kick out any CQL 
 queries that are expensive, e.g. any query with ALLOWS FILTERING, 
 multi-partition queries, secondary index queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >