[jira] [Created] (CASSANDRA-14307) Refactor commitlog

2018-03-09 Thread Dikang Gu (JIRA)
Dikang Gu created CASSANDRA-14307:
-

 Summary: Refactor commitlog
 Key: CASSANDRA-14307
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14307
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Dikang Gu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14117) Refactor read path

2018-03-09 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-14117:
--
Reviewer: Jason Brown
  Status: Patch Available  (was: Open)

A first pass for review, [here | 
https://github.com/DikangGu/cassandra/commits/CASSANDRA-14117-v1].

Refactored SinglePartitionReadCommand and PartitionRangeReadCommand. 
[StorageHandler | 
https://github.com/DikangGu/cassandra/blob/CASSANDRA-14117-v1/src/java/org/apache/cassandra/db/StorageHandler.java]
 is the query handler interface for a storage implementation. 



> Refactor read path
> --
>
> Key: CASSANDRA-14117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14117
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> As part of the pluggable storage engine effort, we'd like to modularize the 
> read path related code, make it to be independent from existing storage 
> engine implementation details.
> For now, refer to 
> https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc
>  for high level designs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14117) Refactor read path

2018-03-09 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu reassigned CASSANDRA-14117:
-

Assignee: Dikang Gu

> Refactor read path
> --
>
> Key: CASSANDRA-14117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14117
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> As part of the pluggable storage engine effort, we'd like to modularize the 
> read path related code, make it to be independent from existing storage 
> engine implementation details.
> For now, refer to 
> https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc
>  for high level designs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14304) DELETE after INSERT IF NOT EXISTS does not work

2018-03-09 Thread Vinay Chella (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Chella reassigned CASSANDRA-14304:


Assignee: Vinay Chella

> DELETE after INSERT IF NOT EXISTS does not work
> ---
>
> Key: CASSANDRA-14304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14304
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Julien
>Assignee: Vinay Chella
>Priority: Major
> Attachments: debug.log, system.log
>
>
> DELETE a row immediately after INSERT IF NOT EXISTS does not work.
> Can be reproduced with this CQL script:
> {code:java}
> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;
> DELETE FROM ks.ta WHERE id = 'myId';
> SELECT * FROM ks.ta WHERE id='myId';
> {code}
> {code:java}
> [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
> WARNING: pyreadline dependency missing.  Install to enable tab completion.
> cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> cqlsh> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
> cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;
>  [applied]
> ---
>   True
> cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
> cqlsh> SELECT * FROM ks.ta WHERE id='myId';
>  id   | col
> --+---
>  myId | myCol
> {code}
>  * Only happens if the client is on a different host (works as expected on 
> the same host)
>  * Works as expected without IF NOT EXISTS
>  * A ~500 ms delay between INSERT and DELETE fixes the issue.
> Logs attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14304) DELETE after INSERT IF NOT EXISTS does not work

2018-03-09 Thread Vinay Chella (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393731#comment-16393731
 ] 

Vinay Chella commented on CASSANDRA-14304:
--

Hi Julien,

I could not repro the same. My cqlsh client was running on a different machine. 
I tried to execute them sequentially without any delay, but none them helped in 
repro. More details on repro would be helpful if it is a consistent repro for 
you.  

{code:java}
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 1 };
cqlsh> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;

 [applied]
---
  True

cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
cqlsh> SELECT * FROM ks.ta WHERE id='myId';

 id | col
+-

(0 rows)
cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;

 [applied]
---
  True

cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
cqlsh> SELECT * FROM ks.ta WHERE id='myId';

 id | col
+-

(0 rows)
cqlsh>
cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;

 [applied]
---
  True

cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
cqlsh> SELECT * FROM ks.ta WHERE id='myId';

 id | col
+-

(0 rows)
cqlsh>
cqlsh>
{code}

> DELETE after INSERT IF NOT EXISTS does not work
> ---
>
> Key: CASSANDRA-14304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14304
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Julien
>Priority: Major
> Attachments: debug.log, system.log
>
>
> DELETE a row immediately after INSERT IF NOT EXISTS does not work.
> Can be reproduced with this CQL script:
> {code:java}
> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;
> DELETE FROM ks.ta WHERE id = 'myId';
> SELECT * FROM ks.ta WHERE id='myId';
> {code}
> {code:java}
> [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
> WARNING: pyreadline dependency missing.  Install to enable tab completion.
> cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
> 'replication_factor' : 1 };
> cqlsh> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
> cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;
>  [applied]
> ---
>   True
> cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
> cqlsh> SELECT * FROM ks.ta WHERE id='myId';
>  id   | col
> --+---
>  myId | myCol
> {code}
>  * Only happens if the client is on a different host (works as expected on 
> the same host)
>  * Works as expected without IF NOT EXISTS
>  * A ~500 ms delay between INSERT and DELETE fixes the issue.
> Logs attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9452) Remove configuration of storage-conf from tools

2018-03-09 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393723#comment-16393723
 ] 

Michael Shuler commented on CASSANDRA-9452:
---

I don't see these functions used anywhere else in-tree.
{noformat}
(trunk)mshuler@hana:~/git/cassandra$ for f in `ls -1 test/resources/functions/| 
cut -d . -f 1`; do git grep $f; done
test/resources/functions/configure_cassandra.sh:function configure_cassandra() {
test/resources/functions/install_cassandra.sh:function install_cassandra() {
test/resources/functions/nodetool_cassandra.sh:function nodetool_cassandra() {
test/resources/functions/start_cassandra.sh:function start_cassandra() {
test/resources/functions/stop_cassandra.sh:function stop_cassandra() {
test/resources/functions/wipe_cassandra.sh:function wipe_cassandra() {
{noformat}

> Remove configuration of storage-conf from tools
> ---
>
> Key: CASSANDRA-9452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9452
> Project: Cassandra
>  Issue Type: Task
>  Components: Configuration, Testing, Tools
>Reporter: Mike Adamson
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: CASSANDRA-9452-trunk.txt
>
>
> The following files still making reference to storage-config and/or 
> storage-conf.xml
> * ./build.xml
> * ./bin/nodetool
> * ./bin/sstablekeys
> * ./test/resources/functions/configure_cassandra.sh
> * ./test/resources/functions/install_cassandra.sh
> * ./tools/bin/json2sstable
> * ./tools/bin/sstable2json
> * ./tools/bin/sstablelevelreset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7840) Refactor static state & functions into static singletons

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393716#comment-16393716
 ] 

ASF GitHub Bot commented on CASSANDRA-7840:
---

Github user michaelsembwever commented on the pull request:


https://github.com/apache/cassandra/commit/ba09523de7cad3b3732c0e7e60b072f84e809e21#commitcomment-28024074
  
In src/java/org/apache/cassandra/dht/BootstrapEvent.java:
In src/java/org/apache/cassandra/dht/BootstrapEvent.java on line 62:
> My bet would be that people would add Events member instances as statics 
again, to save some garbage and because they are stateless anyways.

Yes, and this would definitely make sense for classes that were 
instantiated frequently. 
That is, the following isn't wrong…
```
public class BootStrapper extends ProgressEventNotifierSupport
{
…
private static final BoostrapEvents bootstrapEvents = BootstrapEvents();
…
bootstrapEvents.useSpecifiedTokens(…);
```

But separating the static methods in Event classes out to non-static 
methods in Events classes still provides us improved testability. Which is my 
understanding to the reasoning behind CASSANDRA-7840.


> Refactor static state & functions into static singletons
> 
>
> Key: CASSANDRA-7840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7840
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: 
> 0001-splitting-StorageService-executors-into-a-separate-c.patch, 
> 0002-making-DatabaseDescriptor-a-singleton.patch, 
> 0003-refactoring-StorageService-static-methods.patch, 
> 0004-making-StorageProxy-a-singleton.patch, 
> 0005-making-MigrationManager-a-singleton.patch, 
> 0006-making-SystemKeyspace-a-singleton.patch, 
> 0007-making-Auth-a-singleton.patch, 
> 0008-removing-static-methods-and-initialization-from-Comp.patch, 
> 0009-making-SinkManager-a-singleton.patch, 
> 0010-making-DefsTables-a-singleton.patch, 
> 0011-making-StageManager-a-singleton.patch, 
> 0012-making-MessagingService-a-singleton.patch, 
> 0013-making-QueryProcessor-a-singleton.patch, 
> 0014-refactoring-static-methods-on-Tracing.patch, 
> 0015-removing-static-state-from-BatchlogManager.patch, 
> 0016-removing-static-method-from-CommitLog.patch, 
> 0017-OutboundTcpConnection-removing-singleton-access-from.patch, 
> 0018-FBUtilities-removing-getLocalAddress-getBroadcastAdd.patch, 
> 0019-PendingRangeCalculatorService-removing-singleton-acc.patch, 
> 0020-ActiveRepairService-removing-static-members-and-meth.patch, 
> 0021-RowDataResolver-removing-static-singleton-access-fro.patch, 
> 0022-AbstractReadExecutor-removing-static-method.patch, 
> 0023-StorageServiceAccessor-removing-static-singleton-acc.patch, 
> 0024-FileUtils-removing-static-singleton-accesses-from-st.patch, 
> 0025-ResourceWatcher-removing-static-singleton-access-fro.patch, 
> 0026-TokenMetadata-removing-static-singleton-access-from-.patch, 
> 0027-OutboundTcpConnectionPool-removing-static-singleton-.patch, 
> 0028-Cassandra-PasswordAuthenticator-making-static-method.patch, 
> 0029-CompactionMetrics-making-static-method-instance-meth.patch, 
> 0030-ClientState-splitting-configured-QueryHandler-instan.patch, 
> 0031-SSTableReader-splitting-static-factory-methods-into-.patch, 
> 0032-Keyspace-splitting-static-factory-methods-and-state-.patch, 
> 0033-ColumnFamilyStore-splitting-static-factory-methods-a.patch, 
> 0034-TriggerDefinition-removing-static-singleton-access-f.patch, 
> 0035-CFMetaData-splitting-off-static-factory-methods-onto.patch, 
> 0036-KSMetaData-splitting-off-static-factory-methods-onto.patch, 
> 0037-SystemKeyspace-moving-system-keyspace-definitions-on.patch, 
> 0038-UTMetaData-refactoring-static-singleton-accesses-for.patch, 
> 0039-CounterId-removing-static-singleton-accesses-from-st.patch, 
> 0040-AtomicBtreeColumns-replacing-SystemKeyspace-CFMetaDa.patch
>
>
> 1st step of CASSANDRA-7837.
> Things like DatabaseDescriptor.getPartitioner() should become 
> DatabaseDescriptor.instance.getPartitioner(). In cases where there is a mix 
> of instance state and static functionality (Keyspace & ColumnFamilyStore 
> classes), the static portion should be split off into singleton factory 
> classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9452) Remove configuration of storage-conf from tools

2018-03-09 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393715#comment-16393715
 ] 

Michael Shuler commented on CASSANDRA-9452:
---

`git log test/resources/functions/install_cassandra.sh` shows an old 0.8 svn 
merge commit and one for log4j -> logback. Wouldn't be Debian packaging. The 
script installs some env vars, an /etc/rc.local line, etc. so looks like just a 
helpful old manual install remnant script to me.

> Remove configuration of storage-conf from tools
> ---
>
> Key: CASSANDRA-9452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9452
> Project: Cassandra
>  Issue Type: Task
>  Components: Configuration, Testing, Tools
>Reporter: Mike Adamson
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: CASSANDRA-9452-trunk.txt
>
>
> The following files still making reference to storage-config and/or 
> storage-conf.xml
> * ./build.xml
> * ./bin/nodetool
> * ./bin/sstablekeys
> * ./test/resources/functions/configure_cassandra.sh
> * ./test/resources/functions/install_cassandra.sh
> * ./tools/bin/json2sstable
> * ./tools/bin/sstable2json
> * ./tools/bin/sstablelevelreset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9452) Remove configuration of storage-conf from tools

2018-03-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393696#comment-16393696
 ] 

Jeff Jirsa commented on CASSANDRA-9452:
---

I dont , but maybe [~urandom] , [~tjake] or [~mshuler] may? Old debian 
packaging? 

> Remove configuration of storage-conf from tools
> ---
>
> Key: CASSANDRA-9452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9452
> Project: Cassandra
>  Issue Type: Task
>  Components: Configuration, Testing, Tools
>Reporter: Mike Adamson
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: CASSANDRA-9452-trunk.txt
>
>
> The following files still making reference to storage-config and/or 
> storage-conf.xml
> * ./build.xml
> * ./bin/nodetool
> * ./bin/sstablekeys
> * ./test/resources/functions/configure_cassandra.sh
> * ./test/resources/functions/install_cassandra.sh
> * ./tools/bin/json2sstable
> * ./tools/bin/sstable2json
> * ./tools/bin/sstablelevelreset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9452) Remove configuration of storage-conf from tools

2018-03-09 Thread Vinay Chella (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393694#comment-16393694
 ] 

Vinay Chella commented on CASSANDRA-9452:
-

Not sure if  {{test/resources/functions/install_cassandra.sh}} is used 
anywhere. I could not find the references to it. I will open separate JIRA to 
clean up {{test/resources/functions}} folder. 

[~jjirsa] Do you have context on 
{{test/resources/functions/install_cassandra.sh}} ?

> Remove configuration of storage-conf from tools
> ---
>
> Key: CASSANDRA-9452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9452
> Project: Cassandra
>  Issue Type: Task
>  Components: Configuration, Testing, Tools
>Reporter: Mike Adamson
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: CASSANDRA-9452-trunk.txt
>
>
> The following files still making reference to storage-config and/or 
> storage-conf.xml
> * ./build.xml
> * ./bin/nodetool
> * ./bin/sstablekeys
> * ./test/resources/functions/configure_cassandra.sh
> * ./test/resources/functions/install_cassandra.sh
> * ./tools/bin/json2sstable
> * ./tools/bin/sstable2json
> * ./tools/bin/sstablelevelreset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14306) Single config variable to specify logs path

2018-03-09 Thread Angelo Polo (JIRA)
Angelo Polo created CASSANDRA-14306:
---

 Summary: Single config variable to specify logs path
 Key: CASSANDRA-14306
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14306
 Project: Cassandra
  Issue Type: Improvement
  Components: Configuration
Reporter: Angelo Polo
 Attachments: unified_logs_dir.patch

Motivation: All configuration should take place in bin/cassandra.in.sh (for 
non-Windows) and the various conf/ files. In particular, bin/cassandra should 
not need to be modified upon installation. In many installs, $CASSANDRA_HOME is 
not a writable location, the yaml setting 'data_file_directories' is being set 
to a non-default location, etc. It would be good to have a single variable in 
an explicit conf file to specify where logs should be written.

For non-Windows installs, there are currently two places where the log 
directory is set: in conf/cassandra-env.sh and in bin/cassandra. The defaults 
for these are both $CASSANDRA_HOME/logs. These can be unified to a single 
variable CASSANDRA_LOGS that is set in conf/cassandra-env.sh, with the 
intention that it would be modified once there (if not set in the environment) 
by a user running a custom installation. Then include a check in bin/cassandra 
that CASSANDRA_LOGS is set in case conf/cassandra-env.sh doesn't get sourced on 
startup, and provide a default value if not. For the scenario that a user would 
prefer different paths for the logback logs and the GC logs, they can still go 
into bin/cassandra to set the second path, just as they would do currently. See 
"unified_logs_dir.patch" for a proposed patch. 

No change seems necessary for the Windows scripts. The two uses of 
$CASSANDRA_HOME/logs are in the same script conf/cassandra-env.ps1 within 
scrolling distance of each other (lines 278-301). They haven't been combined I 
suppose because of the different path separators in the two usages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393484#comment-16393484
 ] 

Jeff Jirsa commented on CASSANDRA-14303:


Agreed, this is clever.  

 

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> > CREATE KEYSPACE test WITH replication = {'class': 
> > 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc1': 2}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '2', 'dc2': 3} AND durable_writes = true;
> {noformat}
> On the implementation side I think this may be reasonably straightforward to 
> do an auto-expansion at the time of keyspace creation (or alter), where the 
> above would automatically expand to list out the datacenters. We could allow 
> this to be recomputed whenever an AlterKeyspaceStatement runs so that to add 
> datacenters you would just run:
> {noformat}
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> and this would check that if the dc's in the current schema are different you 
> add in the new ones (_for safety reasons we'd never remove non explicitly 
> supplied zero dcs when auto-generating dcs_). Removing a datacenter becomes 
> an alter that includes an override for the dc you want to remove (or of 
> course you can always not use the auto-expansion and just use the old way):
> {noformat}
> // Tell it explicitly not to replicate to dc2
> > ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> > 'replication_factor': 3, 'dc2': 0}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '3'} AND durable_writes = true;{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393481#comment-16393481
 ] 

Jeremiah Jordan commented on CASSANDRA-14303:
-

Got it. Missed that nuance.  Some keyword used at create/alter time which auto 
expands to "all dc names" sounds like an interesting idea.

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> > CREATE KEYSPACE test WITH replication = {'class': 
> > 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc1': 2}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '2', 'dc2': 3} AND durable_writes = true;
> {noformat}
> On the implementation side I think this may be reasonably straightforward to 
> do an auto-expansion at the time of keyspace creation (or alter), where the 
> above would automatically expand to list out the datacenters. We could allow 
> this to be recomputed whenever an AlterKeyspaceStatement runs so that to add 
> datacenters you would just run:
> {noformat}
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> and this would check that if the dc's in the current schema are different you 
> add in the new ones (_for safety reasons we'd never remove non explicitly 
> supplied zero dcs when auto-generating dcs_). Removing a datacenter becomes 
> an alter that includes an override for the dc you want to remove (or of 
> course you can always not use the auto-expansion and just use the old way):
> {noformat}
> // Tell it explicitly not to replicate to dc2
> > ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> > 'replication_factor': 3, 'dc2': 0}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '3'} AND durable_writes = true;{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13016) log messages should include human readable sizes

2018-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393480#comment-16393480
 ] 

ASF GitHub Bot commented on CASSANDRA-13016:


GitHub user sumantsahney opened a pull request:

https://github.com/apache/cassandra/pull/203

Added ByteConverter function and added unit tests for it

**Log messages should include human readable sizes**

https://issues.apache.org/jira/browse/CASSANDRA-13016

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sumantsahney/cassandra 13016-3.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #203


commit e48dbb8d345d480349d53b48f167f7f463d2d574
Author: Sumant Sahney 
Date:   2018-03-09T19:59:01Z

Added ByteConverter function and added unit tests for it




> log messages should include human readable sizes
> 
>
> Key: CASSANDRA-13016
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13016
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Jon Haddad
>Assignee: Sumant Sahney
>Priority: Major
>  Labels: lhf
>
> displaying bytes by itself is difficult to read when going through log 
> messages.  we should add a human readable version in parens (10MB) after 
> displaying bytes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13697) CDC and VIEW writeType missing from spec for write_timeout / write_failure

2018-03-09 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13697:
---
Status: Ready to Commit  (was: Patch Available)

> CDC and VIEW writeType missing from spec for write_timeout / write_failure
> --
>
> Key: CASSANDRA-13697
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13697
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Andy Tolbert
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
>
> In cassandra 3.0 a new {{WriteType}} {{VIEW}} was added which appears to be 
> used when raising a {{WriteTimeoutException}} when the local view lock for a 
> key cannot be acquired within timeout.
> In cassandra 3.8 {{CDC}} {{WriteType}} was added for when 
> {{cdc_total_space_in_mb}} is exceeded when doing a write to data tracked by 
> cdc.
> The [v4 
> spec|https://github.com/apache/cassandra/blob/cassandra-3.11.0/doc/native_protocol_v4.spec#L1051-L1066]
>  currently doesn't cover these two write types.   While the protocol allows 
> for a free form string for write type, it would be nice to document that 
> types are available since some drivers (java, cpp, python) attempt to 
> deserialize write type into an enum and may not handle it well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (_for safety reasons we'd never remove non explicitly 
supplied zero dcs when auto-generating dcs_). Removing a datacenter becomes an 
alter that includes an override for the dc you want to remove (or of course you 
can always not use the auto-expansion and just use the old way):
{noformat}
// Tell it explicitly not to replicate to dc2
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc2': 0}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3'} AND durable_writes = true;{noformat}

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for 

[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove non-zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes an alter that 
includes an override for the dc you want to remove (or of course you can always 
not use the auto-expansion and just use the old way):
{noformat}
// Tell it explicitly not to replicate to dc2
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc2': 0}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3'} AND durable_writes = true;{noformat}

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety 

[jira] [Comment Edited] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393438#comment-16393438
 ] 

Joseph Lynch edited comment on CASSANDRA-14303 at 3/9/18 7:52 PM:
--

{quote}An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data. Given proper DC creation, it is not required to 
do this right now.
{quote}
Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}

{quote} Also having to set the replication to "0" would back fire the goal of 
the ticket which seems to be that you don't have to manage the RF when adding 
or removing DC's. {quote}

I think the way I'd implement it is that overrides would apply only during 
CREATE/ALTER, so even if you typo'd, when you DESCRIBE you'd see that your 
target datacenter is still there. My goal isn't to make this perfect, just to 
make it much much better than right now (where every single keyspace creation 
requires datacenter information and 9/10 you just want the same RF in all 
datacenters always).

Also users can always use the non auto-expansion if they don't want this. The 
proposed API is fully backwards compatible afaict (unless someone was relying 
on replication_factor throwing an error).


was (Author: jolynch):
{quote}An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data. Given proper DC creation, it is not required to 
do this right now.
{quote}
Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}
.bq Also having to set the replication to "0" would back fire the goal of the 
ticket which seems to be that you don't have to manage the RF when adding or 
removing DC's.

I think the way I'd implement it is that overrides would apply only during 
CREATE/ALTER, so even if you typo'd, when you DESCRIBE you'd see that your 
target datacenter is still there. My goal isn't to make this perfect, just to 
make it much much better than right now (where every single keyspace creation 
requires datacenter information and 9/10 you just want the same RF in all 
datacenters always).

Also users can always use the non auto-expansion if they don't want this. The 
proposed API is fully backwards compatible afaict (unless someone was relying 
on replication_factor throwing an error).

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this 

[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove non-zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes an alter that 
includes an override for the dc you want to remove (or of course you can always 
not use the auto-expansion and just use the old way):
{noformat}
// Tell it explicitly not to replicate to dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc2': 0}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3'} AND durable_writes = true;{noformat}

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety 

[jira] [Comment Edited] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393438#comment-16393438
 ] 

Joseph Lynch edited comment on CASSANDRA-14303 at 3/9/18 7:51 PM:
--

{quote}An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data. Given proper DC creation, it is not required to 
do this right now.
{quote}
Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}
.bq Also having to set the replication to "0" would back fire the goal of the 
ticket which seems to be that you don't have to manage the RF when adding or 
removing DC's.

I think the way I'd implement it is that overrides would apply only during 
CREATE/ALTER, so even if you typo'd, when you DESCRIBE you'd see that your 
target datacenter is still there. My goal isn't to make this perfect, just to 
make it much much better than right now (where every single keyspace creation 
requires datacenter information and 9/10 you just want the same RF in all 
datacenters always).

Also users can always use the non auto-expansion if they don't want this. The 
proposed API is fully backwards compatible afaict (unless someone was relying 
on replication_factor throwing an error).


was (Author: jolynch):
bq. An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data. Given proper DC creation, it is not required to 
do this right now.

Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> > CREATE KEYSPACE test WITH replication = {'class': 
> > 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc1': 2}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 

[jira] [Comment Edited] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393438#comment-16393438
 ] 

Joseph Lynch edited comment on CASSANDRA-14303 at 3/9/18 7:46 PM:
--

bq. An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data. Given proper DC creation, it is not required to 
do this right now.

Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}


was (Author: jolynch):
Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> > CREATE KEYSPACE test WITH replication = {'class': 
> > 'NetworkTopologyStrategy', 'replication_factor': 3, 'dc1': 2}
> > DESCRIBE KEYSPACE test
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': '2', 'dc2': 3} AND durable_writes = true;
> {noformat}
> On the implementation side I think this may be reasonably straightforward to 
> do an auto-expansion at the time of keyspace creation (or alter), where the 
> above would automatically expand to list out the datacenters. We could allow 
> this to be recomputed whenever an AlterKeyspaceStatement runs so that to add 
> datacenters you would just run:
> {noformat}
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> and this would check that if the dc's in the current schema are different you 
> add in the new ones (for safety reasons we'd probably never remove non-zero 
> rf dcs when auto-generating dcs). Removing a datacenter becomes an alter that 
> includes an override for the dc you want to remove (or of course you can 
> always not use the auto-expansion and just use the old way):
> {noformat}
> // Tell it explicitly not to replicate to dc1
> > ALTER KEYSPACE test WITH 

[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 2}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '2', 'dc2': 3} AND durable_writes = true;
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove non-zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes an alter that 
includes an override for the dc you want to remove (or of course you can always 
not use the auto-expansion and just use the old way):
{noformat}
// Tell it explicitly not to replicate to dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc3': 0}

> DESCRIBE KEYSPACE test
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3', 'dc2': 3} AND durable_writes = true;{noformat}

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove none zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes a two step 
process, e.g. if we wanted to 

[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove none zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes a two step 
process, e.g. if we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}{noformat}
I think the only issue with this would be that I think {{EACH_QUORUM}} doesn't 
handle DCs with 0 replicas very well, but I think that is tractable.

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove none zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes a two step 
process, e.g. if we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
// Remove all nodes from dc1
ALTER KEYSPACE test WITH 

[jira] [Commented] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393438#comment-16393438
 ] 

Joseph Lynch commented on CASSANDRA-14303:
--

Yes, that edge case as well as others (gossip inconsistency mostly) is why I 
propose only evaluating the DCs at the time of a CREATE or ALTER statement 
execution. The operator would still have to go run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
manually after adding a new datacenter to trigger the re-generation of the dcs. 
It's also worth noting that as proposed if you then described the keyspace you 
would get all the dcs that it is actually replicated to:
{noformat}
cqlsh> DESCRIBE KEYSPACE test

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'us-west-1': '3', 'us-east-1': 3} AND durable_writes = true;{noformat}

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 0}
> {noformat}
> On the implementation side I think this may be reasonably straightforward to 
> do an auto-expansion at the time of keyspace creation (or alter), where the 
> above would automatically expand to list out the datacenters. We could allow 
> this to be recomputed whenever an AlterKeyspaceStatement runs so that to add 
> datacenters you would just run:
> {noformat}
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> and this would check that if the dc's in the current schema are different you 
> add in the new ones (for safety reasons we'd probably never remove none zero 
> rf dcs when auto-generating dcs). Removing a datacenter becomes a two step 
> process, e.g. if we wanted to remove {{dc1}} we would do:
> {noformat}
> // First tell it not to replicate to dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 0}
> // Remove all nodes from dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> I think the only issue with this would be that I think {{EACH_QUORUM}} 
> doesn't handle DCs with 0 replicas very well, but I think that is tractable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393423#comment-16393423
 ] 

Jeremiah Jordan commented on CASSANDRA-14303:
-

An issue with having a default replication would be that you *must* set 
autobootstrap:false when adding a new DC, otherwise the first nodes added in 
the DC would get all the data.  Given proper DC creation, it is not required to 
do this right now.

You could get around this by setting the replication to "0" before adding any 
nodes from the new DC, but them we would need to remove the guardrails around 
setting replication for DC's which don't exist.  Also having to set the 
replication to "0" would back fire the goal of the ticket which seems to be 
that you don't have to manage the RF when adding or removing DC's.

> NetworkTopologyStrategy could have a "default replication" option
> -
>
> Key: CASSANDRA-14303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14303
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Joseph Lynch
>Priority: Minor
>
> Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
> has to manually specify the datacenters they want their data replicated to 
> with parameters, e.g.:
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'dc1': 3, 'dc2': 3}{noformat}
> This is a poor user interface because it requires the creator of the keyspace 
> (typically a developer) to know the layout of the Cassandra cluster (which 
> may or may not be controlled by them). Also, at least in my experience, folks 
> typo the datacenters _all_ the time. To work around this I see a number of 
> users creating automation around this where the automation describes the 
> Cassandra cluster and automatically expands out to all the dcs that Cassandra 
> knows about. Why can't Cassandra just do this for us, re-using the previously 
> forbidden {{replication_factor}} option (for backwards compatibility):
> {noformat}
>  CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> This would automatically replicate this Keyspace to all datacenters that are 
> present in the cluster. If you need to _override_ the default you could 
> supply a datacenter name, e.g.:
> {noformat}
> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 0}
> {noformat}
> On the implementation side I think this may be reasonably straightforward to 
> do an auto-expansion at the time of keyspace creation (or alter), where the 
> above would automatically expand to list out the datacenters. We could allow 
> this to be recomputed whenever an AlterKeyspaceStatement runs so that to add 
> datacenters you would just run:
> {noformat}
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> and this would check that if the dc's in the current schema are different you 
> add in the new ones (for safety reasons we'd probably never remove none zero 
> rf dcs when auto-generating dcs). Removing a datacenter becomes a two step 
> process, e.g. if we wanted to remove {{dc1}} we would do:
> {noformat}
> // First tell it not to replicate to dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3, 'dc1': 0}
> // Remove all nodes from dc1
> ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
> 'replication_factor': 3}{noformat}
> I think the only issue with this would be that I think {{EACH_QUORUM}} 
> doesn't handle DCs with 0 replicas very well, but I think that is tractable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13697) CDC and VIEW writeType missing from spec for write_timeout / write_failure

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393417#comment-16393417
 ] 

Joseph Lynch commented on CASSANDRA-13697:
--

lg2m

> CDC and VIEW writeType missing from spec for write_timeout / write_failure
> --
>
> Key: CASSANDRA-13697
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13697
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Andy Tolbert
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
>
> In cassandra 3.0 a new {{WriteType}} {{VIEW}} was added which appears to be 
> used when raising a {{WriteTimeoutException}} when the local view lock for a 
> key cannot be acquired within timeout.
> In cassandra 3.8 {{CDC}} {{WriteType}} was added for when 
> {{cdc_total_space_in_mb}} is exceeded when doing a write to data tracked by 
> cdc.
> The [v4 
> spec|https://github.com/apache/cassandra/blob/cassandra-3.11.0/doc/native_protocol_v4.spec#L1051-L1066]
>  currently doesn't cover these two write types.   While the protocol allows 
> for a free form string for write type, it would be nice to document that 
> types are available since some drivers (java, cpp, python) attempt to 
> deserialize write type into an enum and may not handle it well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove none zero rf 
dcs when auto-generating dcs). Removing a datacenter becomes a two step 
process, e.g. if we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
// Remove all nodes from dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
I think the only issue with this would be that I think {{EACH_QUORUM}} doesn't 
handle DCs with 0 replicas very well, but I think that is tractable.

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove dcs when 
auto-generating dcs). Removing a datacenter becomes a two step process, e.g. if 
we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 

[jira] [Updated] (CASSANDRA-14303) NetworkTopologyStrategy could have a "default replication" option

2018-03-09 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14303:
-
Description: 
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones (for safety reasons we'd probably never remove dcs when 
auto-generating dcs). Removing a datacenter becomes a two step process, e.g. if 
we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
// Remove all nodes from dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
I think the only issue with this would be that I think {{EACH_QUORUM}} doesn't 
handle DCs with 0 replicas very well, but I think that is tractable.

  was:
Right now when creating a keyspace with {{NetworkTopologyStrategy}} the user 
has to manually specify the datacenters they want their data replicated to with 
parameters, e.g.:
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': 3, 'dc2': 3}{noformat}
This is a poor user interface because it requires the creator of the keyspace 
(typically a developer) to know the layout of the Cassandra cluster (which may 
or may not be controlled by them). Also, at least in my experience, folks typo 
the datacenters _all_ the time. To work around this I see a number of users 
creating automation around this where the automation describes the Cassandra 
cluster and automatically expands out to all the dcs that Cassandra knows 
about. Why can't Cassandra just do this for us, re-using the previously 
forbidden {{replication_factor}} option (for backwards compatibility):
{noformat}
 CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
This would automatically replicate this Keyspace to all datacenters that are 
present in the cluster. If you need to _override_ the default you could supply 
a datacenter name, e.g.:
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
{noformat}
On the implementation side I think this may be reasonably straightforward to do 
an auto-expansion at the time of keyspace creation (or alter), where the above 
would automatically expand to list out the datacenters. We could allow this to 
be recomputed whenever an AlterKeyspaceStatement runs so that to add 
datacenters you would just run:
{noformat}
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3}{noformat}
and this would check that if the dc's in the current schema are different you 
add in the new ones. Removing a datacenter becomes a two step process, e.g. if 
we wanted to remove {{dc1}} we would do:
{noformat}
// First tell it not to replicate to dc1
ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 
'replication_factor': 3, 'dc1': 0}
// Remove all nodes from dc1

[jira] [Commented] (CASSANDRA-9452) Remove configuration of storage-conf from tools

2018-03-09 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393393#comment-16393393
 ] 

Joseph Lynch commented on CASSANDRA-9452:
-

lg2m

Can you remove the entire 0.6 version check from {{install_cassandra.sh}} while 
you're at it. Actually looking at it I'm not even sure if 
{{test/resources/functions/install_cassandra.sh}} is used at all?

> Remove configuration of storage-conf from tools
> ---
>
> Key: CASSANDRA-9452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9452
> Project: Cassandra
>  Issue Type: Task
>  Components: Configuration, Testing, Tools
>Reporter: Mike Adamson
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: CASSANDRA-9452-trunk.txt
>
>
> The following files still making reference to storage-config and/or 
> storage-conf.xml
> * ./build.xml
> * ./bin/nodetool
> * ./bin/sstablekeys
> * ./test/resources/functions/configure_cassandra.sh
> * ./test/resources/functions/install_cassandra.sh
> * ./tools/bin/json2sstable
> * ./tools/bin/sstable2json
> * ./tools/bin/sstablelevelreset



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13855) URL Seed provider

2018-03-09 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393021#comment-16393021
 ] 

Jon Haddad commented on CASSANDRA-13855:


{quote}
I'm not sure we need this (or if it's beyond the scope of this ticket), or we 
can do it in a follow up ticket. wdyt, Jon Haddad?
{quote}

Agreed, I think we start simple and iterate.  

[~gangil] are you still interested in finishing this patch?  There's only a 
small handful of nits that have to be done. 

> URL Seed provider
> -
>
> Key: CASSANDRA-13855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13855
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Core
>Reporter: Jon Haddad
>Assignee: Akash Gangil
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-Add-URL-Seed-Provider-trunk.txt
>
>
> Seems like including a dead simple seed provider that can fetch from a URL, 1 
> line per seed, would be useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14239) OutOfMemoryError when bootstrapping with less than 100GB RAM

2018-03-09 Thread Sergey Kirillov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392951#comment-16392951
 ] 

Sergey Kirillov edited comment on CASSANDRA-14239 at 3/9/18 3:08 PM:
-

It fails during bootstrap, it fails if I skip bootstrap and doing repair.

memtable size set to 
{code:yaml}
memtable_heap_space_in_mb: 1048
memtable_offheap_space_in_mb: 1048
{code}
and 
{code:yaml}
memtable_flush_writers: 16
{code}

When I analyze heap dump I see that 95% of memory is used by Memtable 
instances. There are 24k instances of Memtable class and their retained heap is 
26G, but they have no GC root. 

So this means that they must be garbage collected, I don't understand why I'm 
getting OOM instead.


was (Author: rushman):
It fails during bootstrap, it fails if I skip bootstrap and doing repair.

memtable size set to 
{code:yaml}
memtable_heap_space_in_mb: 1048
memtable_offheap_space_in_mb: 1048
{code}
and 
{code:yaml}
memtable_flush_writers: 16
{code}

When I analyze heap dump I see that 95% of memory is used by Memtable 
instances. There are 24k instances of Memtable class and their retained heap is 
26G.

> OutOfMemoryError when bootstrapping with less than 100GB RAM
> 
>
> Key: CASSANDRA-14239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
> Project: Cassandra
>  Issue Type: Bug
> Environment: Details of the bootstrapping Node
>  * ProLiant BL460c G7
>  * 56GB RAM
>  * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and 
> saved_caches)
>  * CentOS 7.4 on SD-Card
>  * /tmp and /var/log on tmpfs
>  * Oracle JDK 1.8.0_151
>  * Cassandra 3.11.1
> Cluster
>  * 10 existing Nodes (Up and Normal)
>Reporter: Jürgen Albersdorfer
>Priority: Major
> Attachments: Objects-by-class.csv, 
> Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml, 
> jvm.options, jvm_opts.txt, stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on 
> our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM 
> Heap Old Gen which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see 
> collections, but there is always a remainder which seems to grow forever 
> without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I 
> have given it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over 
> the Network over time during bootstrapping, - which would be a memory leak 
> and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB 
> assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated 
> for org.apache.cassandra.db.Memtable (22 GB) 
> org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer 
> (11 GB)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14239) OutOfMemoryError when bootstrapping with less than 100GB RAM

2018-03-09 Thread Sergey Kirillov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392951#comment-16392951
 ] 

Sergey Kirillov commented on CASSANDRA-14239:
-

It fails during bootstrap, it fails if I skip bootstrap and doing repair.

memtable size set to 
{code:yaml}
memtable_heap_space_in_mb: 1048
memtable_offheap_space_in_mb: 1048
{code}
and 
{code:yaml}
memtable_flush_writers: 16
{code}

When I analyze heap dump I see that 95% of memory is used by Memtable 
instances. There are 24k instances of Memtable class and their retained heap is 
26G.

> OutOfMemoryError when bootstrapping with less than 100GB RAM
> 
>
> Key: CASSANDRA-14239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
> Project: Cassandra
>  Issue Type: Bug
> Environment: Details of the bootstrapping Node
>  * ProLiant BL460c G7
>  * 56GB RAM
>  * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and 
> saved_caches)
>  * CentOS 7.4 on SD-Card
>  * /tmp and /var/log on tmpfs
>  * Oracle JDK 1.8.0_151
>  * Cassandra 3.11.1
> Cluster
>  * 10 existing Nodes (Up and Normal)
>Reporter: Jürgen Albersdorfer
>Priority: Major
> Attachments: Objects-by-class.csv, 
> Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml, 
> jvm.options, jvm_opts.txt, stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on 
> our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM 
> Heap Old Gen which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see 
> collections, but there is always a remainder which seems to grow forever 
> without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I 
> have given it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over 
> the Network over time during bootstrapping, - which would be a memory leak 
> and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB 
> assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated 
> for org.apache.cassandra.db.Memtable (22 GB) 
> org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer 
> (11 GB)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14281) Improve LatencyMetrics performance by reducing write path processing

2018-03-09 Thread Michael Burman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392940#comment-16392940
 ] 

Michael Burman commented on CASSANDRA-14281:


No, I mean one update is copied to multiple histograms when on the hot path. 
Initialized in this method:

[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/TableMetrics.java#L993]

And applied for each update here:

[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/TableMetrics.java#L1043]

Instead of doing it only when metrics are requested (which is also more 
efficient process).

> Improve LatencyMetrics performance by reducing write path processing
> 
>
> Key: CASSANDRA-14281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14281
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
>
> Currently for each write/read/rangequery/CAS touching the CFS we write a 
> latency metric which takes a lot of processing time (up to 66% of the total 
> processing time if the update was empty). 
> The way latencies are recorded is to use both a dropwizard "Timer" as well as 
> "Counter". Latter is used for totalLatency and the previous is decaying 
> metric for rates and certain percentile metrics. We then replicate all of 
> these CFS writes to the KeyspaceMetrics and globalWriteLatencies. 
> Instead of doing this on the write phase we should merge the metrics when 
> they're read. This is much less common occurrence and thus we save a lot of 
> CPU time in total. This also speeds up the write path.
> Currently, the DecayingEstimatedHistogramReservoir acquires a lock for each 
> update operation, which causes a contention if there are more than one thread 
> updating the histogram. This impacts scalability when using larger machines. 
> We should make it lock-free as much as possible and also avoid a single 
> CAS-update from blocking all the concurrent threads from making an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12041) Add CDC to describe table

2018-03-09 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392939#comment-16392939
 ] 

Alan Boudreault commented on CASSANDRA-12041:
-

[~spo...@gmail.com] I can see that the value of the CDC property is NULL in the 
DB. On the driver side, nothing changed. Table options that have a value of 
None are ignored during a export_as_string. My understanding is that cdc should 
probably be `false` instead of null when not specified during a table creation. 
Is this null expected?

> Add CDC to describe table
> -
>
> Key: CASSANDRA-12041
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12041
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Tools
>Reporter: Carl Yeksigian
>Assignee: Stefania
>Priority: Major
>  Labels: client-impacting
> Fix For: 3.8
>
>
> Currently we do not output CDC with {{DESCRIBE TABLE}}, but should include 
> that for 3.8+ tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14281) Improve LatencyMetrics performance by reducing write path processing

2018-03-09 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392930#comment-16392930
 ] 

Chris Lohfink commented on CASSANDRA-14281:
---

what are the unnecessary multiple updates? You mean System.nanotime multiple 
times?

> Improve LatencyMetrics performance by reducing write path processing
> 
>
> Key: CASSANDRA-14281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14281
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
>
> Currently for each write/read/rangequery/CAS touching the CFS we write a 
> latency metric which takes a lot of processing time (up to 66% of the total 
> processing time if the update was empty). 
> The way latencies are recorded is to use both a dropwizard "Timer" as well as 
> "Counter". Latter is used for totalLatency and the previous is decaying 
> metric for rates and certain percentile metrics. We then replicate all of 
> these CFS writes to the KeyspaceMetrics and globalWriteLatencies. 
> Instead of doing this on the write phase we should merge the metrics when 
> they're read. This is much less common occurrence and thus we save a lot of 
> CPU time in total. This also speeds up the write path.
> Currently, the DecayingEstimatedHistogramReservoir acquires a lock for each 
> update operation, which causes a contention if there are more than one thread 
> updating the histogram. This impacts scalability when using larger machines. 
> We should make it lock-free as much as possible and also avoid a single 
> CAS-update from blocking all the concurrent threads from making an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14305) Use $CASSANDRA_CONF not $CASSANDRA_HOME/conf in cassandra-env.sh

2018-03-09 Thread Angelo Polo (JIRA)
Angelo Polo created CASSANDRA-14305:
---

 Summary: Use $CASSANDRA_CONF not $CASSANDRA_HOME/conf in 
cassandra-env.sh 
 Key: CASSANDRA-14305
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14305
 Project: Cassandra
  Issue Type: Improvement
  Components: Configuration
Reporter: Angelo Polo
 Attachments: conf_cassandra-env.sh.patch

CASSANDRA_CONF should be used uniformly in conf/cassandra-env.sh to reference 
the configuration path. Currently, jaas users will have to modify the default 
path provided for cassandra-jaas.config if their $CASSANDRA_CONF differs from 
$CASSANDRA_HOME/conf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14299) cqlsh: ssl setting not read from cqlshrc in 3.11

2018-03-09 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-14299:
---
 Assignee: Stefan Podkowinski
Fix Version/s: 4.x
   3.11.x
   Status: Patch Available  (was: Open)

Looks like a merge oversight in 
[6d429cd|https://github.com/apache/cassandra/commit/6d429cd]. Trivial fix is 
linked, do you mind take a look [~ifesdjeen]?

> cqlsh: ssl setting not read from cqlshrc in 3.11 
> -
>
> Key: CASSANDRA-14299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14299
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Christian Becker
>Assignee: Stefan Podkowinski
>Priority: Major
> Fix For: 3.11.x, 4.x
>
>
> With CASSANDRA-10458 an option was added to read the {{--ssl}} flag from 
> cqlshrc, however the commit seems to have been incorrectly merged or the 
> changes were dropped somehow.
> Currently adding the following has no effect:
> {code:java}
> [connection]
> ssl = true{code}
> When looking at the current tree it's obvious that the flag is not read: 
> [https://github.com/apache/cassandra/blame/cassandra-3.11/bin/cqlsh.py#L2247]
> However it should have been added with 
> [https://github.com/apache/cassandra/commit/70649a8d65825144fcdbde136d9b6354ef1fb911]
> The values like {{DEFAULT_SSL = False}}  are present, but the 
> {{option_with_default()}} call is missing.
> Git blame also shows no change to that line which would have reverted the 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14215) Cassandra does not seem to be respecting max hint window

2018-03-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392878#comment-16392878
 ] 

Aleksey Yeschenko commented on CASSANDRA-14215:
---

The CAS patch looks good to me, and I agree it's fine for all of 3.0, 3.11, and 
trunk.

I'm not yet sure about the change to the semantic of hint window, however. And 
either way, we should be committing both in the same ticket.

Can you please split it out into a separate JIRA? In that case I'll +1 the CAS 
patch, and we can keep discussion about the window change and review it 
separately.

Cheers.

> Cassandra does not seem to be respecting max hint window
> 
>
> Key: CASSANDRA-14215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Hints, Streaming and Messaging
>Reporter: Arijit Banerjee
>Assignee: Kurt Greaves
>Priority: Major
>
> On Cassandra 3.0.9, it was observed that Cassandra continues to write hints 
> even though a node remains down (and does not come up) for longer than the 
> default 3 hour window.
>  
> After doing "nodetool setlogginglevel org.apache.cassandra TRACE", we see the 
> following log line in cassandra (debug) logs:
>  StorageProxy.java:2625 - Adding hints for [/10.0.100.84]
>  
> One possible code path seems to be:
> cas -> commitPaxos(proposal, consistencyForCommit, true); -> submitHint (in 
> StorageProxy.java)
>  
> The "true" parameter above explicitly states that a hint should be recorded 
> and ignores the time window calculation performed by the shouldHint method 
> invoked in other code paths. Is there a reason for this behavior?
>  
> Edit: There are actually two stacks that seem to be producing hints, the 
> "cas" and "syncWriteBatchedMutations" methods. I have posted them below.
>  
> A third issue seems to be that Cassandra seems to reset the timer which 
> counts how long a node has been down after a restart. Thus if Cassandra is 
> restarted on a good node, it continues to accumulate hints for a down node 
> over the next three hours.
>  
> WARN [SharedPool-Worker-14] 2018-02-06 22:15:51,136 StorageProxy.java:2636 - 
> Adding hints for [/10.0.100.84] with stack trace: java.lang.Throwable: at 
> org.apache.cassandra.service.StorageProxy.stackTrace(StorageProxy.java:2608) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2617) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2603) 
> at 
> org.apache.cassandra.service.StorageProxy.commitPaxos(StorageProxy.java:540) 
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:282) at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:432)
>  at 
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:407)
>  at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206)
>  at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:237) 
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) 
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at 
> java.lang.Thread.run(Thread.java:748) WARN
>  
>  
> [SharedPool-Worker-8] 2018-02-06 22:15:51,153 StorageProxy.java:2636 - Adding 
> hints for [/10.0.100.84] with stack trace: java.lang.Throwable: at 
> org.apache.cassandra.service.StorageProxy.stackTrace(StorageProxy.java:2608) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2617) 
> at 
> org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1247)
>  at 
> org.apache.cassandra.service.StorageProxy.syncWriteBatchedMutations(StorageProxy.java:1014)
>  at 
> org.apache.cassandra.service.StorageProxy.mutateAtomically(StorageProxy.java:899)
>  at 
> 

[jira] [Updated] (CASSANDRA-14301) Error running cqlsh: unexpected keyword argument 'no_compact'

2018-03-09 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-14301:
---
Labels: cqlsh proposed-wontfix  (was: cqlsh)

> Error running cqlsh: unexpected keyword argument 'no_compact'
> -
>
> Key: CASSANDRA-14301
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14301
> Project: Cassandra
>  Issue Type: Bug
>Reporter: M. Justin
>Priority: Major
>  Labels: cqlsh, proposed-wontfix
>
> I recently installed Cassandra 3.11.2 on my Mac using Homebrew.  When I run 
> the "cqlsh" command, I get the following error:
> {code:none}
> $ cqlsh
> Traceback (most recent call last):
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 2443, 
> in 
>     main(*read_options(sys.argv[1:], os.environ))
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 2421, 
> in main
>     encoding=options.encoding)
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 488, 
> in __init__
>     **kwargs)
>   File "cassandra/cluster.py", line 735, in 
> cassandra.cluster.Cluster.__init__ (cassandra/cluster.c:10935)
> TypeError: __init__() got an unexpected keyword argument 'no_compact'
> {code}
> Commenting out [line 483 of 
> cqlsh.py|https://github.com/apache/cassandra/blob/cassandra-3.11.2/bin/cqlsh.py#L483]
>  works around the issue:
> {code}
> # no_compact=no_compact
> {code}
>  I am not the only person impacted, as evidenced by [this existing Stack 
> Overflow 
> post|https://stackoverflow.com/questions/48885984/was-cqlsh-5-0-1-broken-in-cassandra-3-11-2-release]
>  from 2/20/2018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14301) Error running cqlsh: unexpected keyword argument 'no_compact'

2018-03-09 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392767#comment-16392767
 ] 

Stefan Podkowinski commented on CASSANDRA-14301:


Looks like you have to bundle a newer version of the python driver in the brew 
formula. But we only test and support the python driver shipped as part of the 
official distribution, so you might run into unexpected issues using a 
different driver artifact. 

There seems to be already a brew github issue 
[24977|https://github.com/Homebrew/homebrew-core/issues/24977] for that as well.


> Error running cqlsh: unexpected keyword argument 'no_compact'
> -
>
> Key: CASSANDRA-14301
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14301
> Project: Cassandra
>  Issue Type: Bug
>Reporter: M. Justin
>Priority: Major
>  Labels: cqlsh
>
> I recently installed Cassandra 3.11.2 on my Mac using Homebrew.  When I run 
> the "cqlsh" command, I get the following error:
> {code:none}
> $ cqlsh
> Traceback (most recent call last):
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 2443, 
> in 
>     main(*read_options(sys.argv[1:], os.environ))
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 2421, 
> in main
>     encoding=options.encoding)
>   File "/usr/local/Cellar/cassandra/3.11.2/libexec/bin/cqlsh.py", line 488, 
> in __init__
>     **kwargs)
>   File "cassandra/cluster.py", line 735, in 
> cassandra.cluster.Cluster.__init__ (cassandra/cluster.c:10935)
> TypeError: __init__() got an unexpected keyword argument 'no_compact'
> {code}
> Commenting out [line 483 of 
> cqlsh.py|https://github.com/apache/cassandra/blob/cassandra-3.11.2/bin/cqlsh.py#L483]
>  works around the issue:
> {code}
> # no_compact=no_compact
> {code}
>  I am not the only person impacted, as evidenced by [this existing Stack 
> Overflow 
> post|https://stackoverflow.com/questions/48885984/was-cqlsh-5-0-1-broken-in-cassandra-3-11-2-release]
>  from 2/20/2018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12041) Add CDC to describe table

2018-03-09 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392750#comment-16392750
 ] 

Stefan Podkowinski commented on CASSANDRA-12041:


I was going through some of the tests failures produced by running 
{{pylib/cqlshlib/test}} and it seems we have a little regression going on:

{{nosetests  
test_cqlsh_output.py:TestCqlshOutput.test_describe_columnfamily_output}}

{noformat}
FAIL: test_describe_columnfamily_output 
(cqlshlib.test.test_cqlsh_output.TestCqlshOutput)
--
Traceback (most recent call last):
  File 
"/home/spod/git/cassandra-trunk/pylib/cqlshlib/test/test_cqlsh_output.py", line 
638, in test_describe_columnfamily_output
self.assertSequenceEqual(output.split('\n'), table_desc3.split('\n'))
AssertionError: Sequences differ: ['', 'CREATE TABLE "CqlshTests... != ['', 
'CREATE TABLE "CqlshTests...

First differing element 20:
"AND comment = ''"
'AND cdc = false'
{noformat}

If I understand correctly, the cdc flag is supposed to show up in the output, 
which it is not.


> Add CDC to describe table
> -
>
> Key: CASSANDRA-12041
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12041
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Tools
>Reporter: Carl Yeksigian
>Assignee: Stefania
>Priority: Major
>  Labels: client-impacting
> Fix For: 3.8
>
>
> Currently we do not output CDC with {{DESCRIBE TABLE}}, but should include 
> that for 3.8+ tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14304) DELETE after INSERT IF NOT EXISTS does not work

2018-03-09 Thread Julien (JIRA)
Julien created CASSANDRA-14304:
--

 Summary: DELETE after INSERT IF NOT EXISTS does not work
 Key: CASSANDRA-14304
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14304
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Julien
 Attachments: debug.log, system.log

DELETE a row immediately after INSERT IF NOT EXISTS does not work.

Can be reproduced with this CQL script:
{code:java}
CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 1 };
CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;
DELETE FROM ks.ta WHERE id = 'myId';
SELECT * FROM ks.ta WHERE id='myId';
{code}
{code:java}
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
WARNING: pyreadline dependency missing.  Install to enable tab completion.
cqlsh> CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 1 };
cqlsh> CREATE TABLE ks.ta ( id text PRIMARY KEY, col text );
cqlsh> INSERT INTO ks.ta (id, col) VALUES ('myId', 'myCol') IF NOT EXISTS;

 [applied]
---
  True

cqlsh> DELETE FROM ks.ta WHERE id = 'myId';
cqlsh> SELECT * FROM ks.ta WHERE id='myId';

 id   | col
--+---
 myId | myCol
{code}
 * Only happens if the client is on a different host (works as expected on the 
same host)

 * Works as expected without IF NOT EXISTS

 * A ~500 ms delay between INSERT and DELETE fixes the issue.

Logs attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14281) Improve LatencyMetrics performance by reducing write path processing

2018-03-09 Thread Michael Burman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392655#comment-16392655
 ] 

Michael Burman commented on CASSANDRA-14281:


Additional note.. there's also same kind of unnecessary multiple updates in the 
read path. Including TableMetrics & TableHistogram's update. This patch 
obviously helps a little bit, since it reduces the histogram update time, but 
there's most likely stuff to reduce. Fixing those helps in the write path the: 
"metric.colUpdateTimeDeltaHistogram.update(Math.min(18165375903306L, 
timeDelta));" line, but these are very minor performance issues in the update 
path at the moment. Another ticket for read path?

> Improve LatencyMetrics performance by reducing write path processing
> 
>
> Key: CASSANDRA-14281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14281
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Burman
>Assignee: Michael Burman
>Priority: Major
>
> Currently for each write/read/rangequery/CAS touching the CFS we write a 
> latency metric which takes a lot of processing time (up to 66% of the total 
> processing time if the update was empty). 
> The way latencies are recorded is to use both a dropwizard "Timer" as well as 
> "Counter". Latter is used for totalLatency and the previous is decaying 
> metric for rates and certain percentile metrics. We then replicate all of 
> these CFS writes to the KeyspaceMetrics and globalWriteLatencies. 
> Instead of doing this on the write phase we should merge the metrics when 
> they're read. This is much less common occurrence and thus we save a lot of 
> CPU time in total. This also speeds up the write path.
> Currently, the DecayingEstimatedHistogramReservoir acquires a lock for each 
> update operation, which causes a contention if there are more than one thread 
> updating the histogram. This impacts scalability when using larger machines. 
> We should make it lock-free as much as possible and also avoid a single 
> CAS-update from blocking all the concurrent threads from making an update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-5836) Seed nodes should be able to bootstrap without manual intervention

2018-03-09 Thread Oleksandr Shulgin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392587#comment-16392587
 ] 

Oleksandr Shulgin commented on CASSANDRA-5836:
--

{quote}If N>RF it becomes less likely that you'll have one replica in each DC 
for every range.{quote}

Without defining {{N}} it's hard for me to say what you mean here.  Maybe we 
should move this part of discussion off the ticket? :)

{quote}nodetool rebuild should probably avoid rebuilding SimpleStrategy 
keyspaces and you shouldn't get an error for them.{quote}

That would be nice.

{quote}Bootstrapping SimpleStrategy across DC's is still relevant as long as 
SimpleStrategy exists.{quote}

To clarify we are not talking about significant amount of data, i.e. 
user-defined keyspaces here?  I would assume that if we teach nodetool rebuild 
to ignore SimpleStrategy keyspaces, they could be cheaply spread to new DC by 
running a repair targeted at these small system keyspaces only.

{quote}with my patch we could forget about the instructions telling people to 
set auto_bootstrap=false when adding a new DC.{quote}

Hold on, how is this going to work at all?  If the first node in new DC is 
going to bootstrap (let's assume seeds are allowed to bootstrap) it will own 
the whole token ring at first, so it will have to stream in all the data that 
exists in the source DC, times the RF(s) of new DC.  Even if the new node 
doesn't die a horrible death in the process, you won't be able to add another 
node to the cluster until this is finished.  And even after that, adding the 
next node to new DC will take ~50% of ownership from the first one, so you will 
need to run cleanup on the first one in the end, etc. for the rest of the new 
nodes.

It is totally unpractical to add new DC this way, so I firmly believe that 
{{auto_bootstrap=false}} is here to stay for new DCs.

{quote}1. You still need code to handle the case where a seed starts with 
auto_bootstrap=true but it's a new cluster.{quote}

I would prefer this just to fail with some helpful error message.  Because:

{quote}You could potentially know when to fail by checking your seeds list and 
seeing if you are the only seed (then create a cluster, else fail). But I still 
don't see this as terribly necessary.{quote}

... I have doubts that any of these checks can be made really bullet-proof.

{quote}2. Seems a bit silly to have a new cluster procedure where the first 
step is to "set this to false in the yaml... because we said so". Especially 
when we can avoid that situation.{quote}

Well, we have this for adding new DCs, so not really that silly.  It also 
doesn't have to be "we said so", for me the explanation is simple: the first 
seed node will fail the bootstrap otherwise, because there is no other nodes to 
bootstrap from yet.

{quote}Note that when I say special case I mean a special case in the code, not 
for the user. My patch (maybe with some tweaks) should be able to decide 
automatically every case where a seed should bootstrap versus when it 
shouldn't. If we can do that in the code, there's no reason to worry about 
changing any procedures or behaviours, and we don't need to worry about 
explaining the intricacies of why a seed can't bootstrap.{quote}

Again, I have serious doubts about all this automatic corner case detection.  
As I've said before I'm totally fine with making initial cluster set up a 
little bit more involved, if that makes operations on the clusters in 
production more reliable.

> Seed nodes should be able to bootstrap without manual intervention
> --
>
> Key: CASSANDRA-5836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bill Hathaway
>Priority: Minor
>
> The current logic doesn't allow a seed node to be bootstrapped.  If a user 
> wants to bootstrap a node configured as a seed (for example to replace a seed 
> node via replace_token), they first need to remove the node's own IP from the 
> seed list, and then start the bootstrap process.  This seems like an 
> unnecessary step since a node never uses itself as a seed.
> I think it would be a better experience if the logic was changed to allow a 
> seed node to bootstrap without manual intervention when there are other seed 
> nodes up in a ring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org