date:20160711

[jira] [Commented] (CASSANDRA-12172) Fail to bootstrap new node.

2016-07-11 Thread Joel Knighton (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372209#comment-15372209
 ] 

Joel Knighton commented on CASSANDRA-12172:
---

I'm not sure this is a bug - if so, we'll need more information to resolve it.

This error message is correct. Cassandra will try to bootstrap from the replica 
being replaced unless you advise it otherwise "-Dc
assandra.consistent.rangemovement=false". 

Cassandra's failure detection builds up a statistical estimate based on gossip 
updates as to whether a node is up or down. It may be that you have an 
unreliable network and need to tune the phi conviction threshold appropriately 
- 
https://github.com/apache/cassandra/blob/91392edbe812c722adcf35cf167bf400d25dc99a/conf/cassandra.yaml#L855.
 Otherwise, it may be the case that some behavior on the hosts being marked 
down is preventing them from gossiping/performing other tasks, such as a long 
GC pause. In this sense, the bug is not in the failure detection but in some 
other component.

We could get a better perspective on this with trace/debug level logs from the 
bootstrapping node and also a node marked down at the time of bootstrap.

> Fail to bootstrap new node.
> ---
>
> Key: CASSANDRA-12172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12172
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>
> When I try to bootstrap new node in the cluster, sometimes it failed because 
> of following exceptions.
> {code}
> 2016-07-12_05:14:55.58509 INFO  05:14:55 [main]: JOINING: Starting to 
> bootstrap...
> 2016-07-12_05:14:56.07491 INFO  05:14:56 [GossipTasks:1]: InetAddress 
> /2401:db00:2011:50c7:face:0:9:0 is now DOWN
> 2016-07-12_05:14:56.32219 Exception (java.lang.RuntimeException) encountered 
> during startup: A node required to move the data consistently is down 
> (/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a 
> potentially inconsis
> tent replica, restart the node with -Dcassandra.consistent.rangemovement=false
> 2016-07-12_05:14:56.32582 ERROR 05:14:56 [main]: Exception encountered during 
> startup
> 2016-07-12_05:14:56.32583 java.lang.RuntimeException: A node required to move 
> the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish 
> to move the data from a potentially inconsistent replica, restart the node 
> with -Dc
> assandra.consistent.rangemovement=false
> 2016-07-12_05:14:56.32584   at 
> org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:264)
>  ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584   at 
> org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:147) 
> ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584   at 
> org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:82) 
> ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1230)
>  ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:924)
>  ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:709)
>  ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:585)
>  ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) 
> [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32586   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32586   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) 
> [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32730 WARN  05:14:56 [StorageServiceShutdownHook]: No 
> local state or state is in silent shutdown, not announcing shutdown
> {code}
> Here are more logs: 
> https://gist.github.com/DikangGu/c6a83eafdbc091250eade4a3bddcc40b
> I'm pretty sure there are no DOWN nodes or restarted nodes in the cluster, 
> but I still see a lot of nodes UP and DOWN in the gossip log, which failed 
> the bootstrap at the end, is this a known

[jira] [Updated] (CASSANDRA-10805) Additional Compaction Logging

2016-07-11 Thread Wei Deng (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-10805:
-
Labels: doc-impacting  (was: )

> Additional Compaction Logging
> -
>
> Key: CASSANDRA-10805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10805
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Compaction, Observability
>Reporter: Carl Yeksigian
>Assignee: Carl Yeksigian
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 3.6
>
>
> Currently, viewing the results of past compactions requires parsing the log 
> and looking at the compaction history system table, which doesn't have 
> information about, for example, flushed sstables not previously compacted.
> This is a proposal to extend the information captured for compaction. 
> Initially, this would be done through a JMX call, but if it proves to be 
> useful and not much overhead, it might be a feature that could be enabled for 
> the compaction strategy all the time.
> Initial log information would include:
> - The compaction strategy type controlling each column family
> - The set of sstables included in each compaction strategy
> - Information about flushes and compactions, including times and all involved 
> sstables
> - Information about sstables, including generation, size, and tokens
> - Any additional metadata the strategy wishes to add to a compaction or an 
> sstable, like the level of an sstable or the type of compaction being 
> performed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12172) Fail to bootstrap new node.

2016-07-11 Thread Dikang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-12172:
--
Description: 
When I try to bootstrap new node in the cluster, sometimes it failed because of 
following exceptions.

{code}
2016-07-12_05:14:55.58509 INFO  05:14:55 [main]: JOINING: Starting to 
bootstrap...
2016-07-12_05:14:56.07491 INFO  05:14:56 [GossipTasks:1]: InetAddress 
/2401:db00:2011:50c7:face:0:9:0 is now DOWN
2016-07-12_05:14:56.32219 Exception (java.lang.RuntimeException) encountered 
during startup: A node required to move the data consistently is down 
(/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a 
potentially inconsis
tent replica, restart the node with -Dcassandra.consistent.rangemovement=false
2016-07-12_05:14:56.32582 ERROR 05:14:56 [main]: Exception encountered during 
startup
2016-07-12_05:14:56.32583 java.lang.RuntimeException: A node required to move 
the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish to 
move the data from a potentially inconsistent replica, restart the node with -Dc
assandra.consistent.rangemovement=false
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:264)
 ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:147) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:82) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1230) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:924)
 ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:709) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32586   at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32586   at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32730 WARN  05:14:56 [StorageServiceShutdownHook]: No local 
state or state is in silent shutdown, not announcing shutdown
{code}

Here are more logs: 
https://gist.github.com/DikangGu/c6a83eafdbc091250eade4a3bddcc40b

I'm pretty sure there are no DOWN nodes or restarted nodes in the cluster, but 
I still see a lot of nodes UP and DOWN in the gossip log, which failed the 
bootstrap at the end, is this a known bug?

  was:
When I try to bootstrap new node in the cluster, sometimes it failed because of 
following exceptions.


2016-07-12_05:14:55.58509 INFO  05:14:55 [main]: JOINING: Starting to 
bootstrap...
2016-07-12_05:14:56.07491 INFO  05:14:56 [GossipTasks:1]: InetAddress 
/2401:db00:2011:50c7:face:0:9:0 is now DOWN
2016-07-12_05:14:56.32219 Exception (java.lang.RuntimeException) encountered 
during startup: A node required to move the data consistently is down 
(/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a 
potentially inconsis
tent replica, restart the node with -Dcassandra.consistent.rangemovement=false
2016-07-12_05:14:56.32582 ERROR 05:14:56 [main]: Exception encountered during 
startup
2016-07-12_05:14:56.32583 java.lang.RuntimeException: A node required to move 
the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish to 
move the data from a potentially inconsistent replica, restart the node with -Dc
assandra.consistent.rangemovement=false
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:264)
 ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:147) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]

[jira] [Created] (CASSANDRA-12172) Fail to bootstrap new node.

2016-07-11 Thread Dikang Gu (JIRA)

Dikang Gu created CASSANDRA-12172:
-

 Summary: Fail to bootstrap new node.
 Key: CASSANDRA-12172
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12172
 Project: Cassandra
  Issue Type: Bug
Reporter: Dikang Gu


When I try to bootstrap new node in the cluster, sometimes it failed because of 
following exceptions.


2016-07-12_05:14:55.58509 INFO  05:14:55 [main]: JOINING: Starting to 
bootstrap...
2016-07-12_05:14:56.07491 INFO  05:14:56 [GossipTasks:1]: InetAddress 
/2401:db00:2011:50c7:face:0:9:0 is now DOWN
2016-07-12_05:14:56.32219 Exception (java.lang.RuntimeException) encountered 
during startup: A node required to move the data consistently is down 
(/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a 
potentially inconsis
tent replica, restart the node with -Dcassandra.consistent.rangemovement=false
2016-07-12_05:14:56.32582 ERROR 05:14:56 [main]: Exception encountered during 
startup
2016-07-12_05:14:56.32583 java.lang.RuntimeException: A node required to move 
the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish to 
move the data from a potentially inconsistent replica, restart the node with -Dc
assandra.consistent.rangemovement=false
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:264)
 ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:147) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:82) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1230) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32584   at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:924)
 ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:709) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) 
~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32585   at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32586   at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32586   at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) 
[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
2016-07-12_05:14:56.32730 WARN  05:14:56 [StorageServiceShutdownHook]: No local 
state or state is in silent shutdown, not announcing shutdown


Here are more logs: 
https://gist.github.com/DikangGu/c6a83eafdbc091250eade4a3bddcc40b

I'm pretty sure there are no DOWN nodes or restarted nodes in the cluster, but 
I still see a lot of nodes UP and DOWN in the gossip log, which failed the 
bootstrap at the end, is this a known bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

cassandra git commit: fix SSTableSizeSummer's file skipping logic

2016-07-11 Thread dbrosius

Repository: cassandra
Updated Branches:
  refs/heads/trunk de73b5c7b -> 91392edbe


fix SSTableSizeSummer's file skipping logic


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/91392edb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/91392edb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/91392edb

Branch: refs/heads/trunk
Commit: 91392edbe812c722adcf35cf167bf400d25dc99a
Parents: de73b5c
Author: Dave Brosius 
Authored: Tue Jul 12 00:20:40 2016 -0400
Committer: Dave Brosius 
Committed: Tue Jul 12 00:20:40 2016 -0400

--
 src/java/org/apache/cassandra/db/Directories.java | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/91392edb/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index 87527e8..a83c845 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -17,8 +17,6 @@
  */
 package org.apache.cassandra.db;
 
-import static com.google.common.collect.Sets.newHashSet;
-
 import java.io.File;
 import java.io.FileFilter;
 import java.io.IOError;
@@ -1014,14 +1012,14 @@ public class Directories
 }
 
 @Override
-public boolean isAcceptable(Path file)
+public boolean isAcceptable(Path path)
 {
-String fileName = file.toFile().getName();
-Pair pair = 
SSTable.tryComponentFromFilename(file.getParent().toFile(), fileName);
+File file = path.toFile();
+Pair pair = 
SSTable.tryComponentFromFilename(path.getParent().toFile(), file.getName());
 return pair != null
 && pair.left.ksname.equals(metadata.ksName)
 && pair.left.cfname.equals(metadata.cfName)
-&& !toSkip.contains(fileName);
+&& !toSkip.contains(file);
 }
 }
 }

cassandra git commit: use precomputed end ClusteringBound

2016-07-11 Thread dbrosius

Repository: cassandra
Updated Branches:
  refs/heads/trunk 019d43734 -> de73b5c7b


use precomputed end ClusteringBound


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/de73b5c7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/de73b5c7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/de73b5c7

Branch: refs/heads/trunk
Commit: de73b5c7bef402782ff775fba772cb3f870bbc16
Parents: 019d437
Author: Dave Brosius 
Authored: Tue Jul 12 00:08:53 2016 -0400
Committer: Dave Brosius 
Committed: Tue Jul 12 00:08:53 2016 -0400

--
 src/java/org/apache/cassandra/db/RangeTombstoneList.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/de73b5c7/src/java/org/apache/cassandra/db/RangeTombstoneList.java
--
diff --git a/src/java/org/apache/cassandra/db/RangeTombstoneList.java 
b/src/java/org/apache/cassandra/db/RangeTombstoneList.java
index c60b774..716213d 100644
--- a/src/java/org/apache/cassandra/db/RangeTombstoneList.java
+++ b/src/java/org/apache/cassandra/db/RangeTombstoneList.java
@@ -549,7 +549,7 @@ public class RangeTombstoneList implements 
Iterable, IMeasurable
 ClusteringBound newEnd = start.invert();
 if (!Slice.isEmpty(comparator, starts[i], newEnd))
 {
-addInternal(i, starts[i], start.invert(), 
markedAts[i], delTimes[i]);
+addInternal(i, starts[i], newEnd, markedAts[i], 
delTimes[i]);
 i++;
 setInternal(i, start, ends[i], markedAts[i], 
delTimes[i]);
 }

cassandra git commit: push down indexFileLength calc, to where it's used

2016-07-11 Thread dbrosius

Repository: cassandra
Updated Branches:
  refs/heads/trunk 9fed08449 -> 019d43734


push down indexFileLength calc, to where it's used


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/019d4373
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/019d4373
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/019d4373

Branch: refs/heads/trunk
Commit: 019d43734d30499242af621541cf1c48958046e3
Parents: 9fed084
Author: Dave Brosius 
Authored: Tue Jul 12 00:00:04 2016 -0400
Committer: Dave Brosius 
Committed: Tue Jul 12 00:00:04 2016 -0400

--
 src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/019d4373/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
--
diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java 
b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
index 78a6825..fc0849f 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
@@ -752,11 +752,11 @@ public abstract class SSTableReader extends SSTable 
implements SelfRefCounted

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-11 Thread Stefania (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372114#comment-15372114
]

Stefania commented on CASSANDRA-9318:
-

bq. But can that really happen? ResponseVerbHandler returns before incrementing
back-pressure if the callback is null (i.e. expired), and OutboundTcpConnection
doesn't even send outbound messages if they're timed out, or am I missing
something?

You're correct, we won't count twice because the callback is already null.
However, this raises another point, if a message expires before it is sent, we
consider this negatively for that replica, since we increment the outgoing rate
but not the incoming rate when the callback expires, and still it may have
nothing to do with the replica if the message was not sent, it may be due to
the coordinator dealing with too many messages.

bq. Again, I believe this would make enabling/disabling back-pressure via JMX
less user friendly.

Fine, let's keep the boolean since it makes life easier for JMX.

bq. I do not think sorting replicas is what we really need, as you have to send
the mutation to all replicas anyway. I think what you rather need is a way to
pre-emptively fail if the write consistency level is not met by enough
"non-overloaded" replicas, i.e.:

You're correct in that the replicas are not sorted in the write path, only in
the read path. I confused the two yesterday. For sure we need to only fail if
the write consistency level is not met. I also observe that if a replica has a
low rate, then we may block when acquiring the limiter, and this will
indirectly throttle for all following replicas, even if they were ready to
receive mutations sooner. Therefore, even a single overloaded or slow replica
may slow the entire write operation. Further, AbstractWriteResponseHandler sets
the start time in the constructor, so the time spent acquiring a rate limiter
for slow replicas counts towards the total time before the coordinator throws a
write timeout exception. So, unless we increase the write RPC timeout or change
the existing behavior, we may observe write timeout exceptions and, at CL.ANY,
hints.

Also, in SP.sendToHintedEndpoints(), we should apply backpressure only if the
destination is alive.

{quote}
This leaves us with two options:

Adding a new exception to the native protocol.
Reusing a different exception, with WriteFailureException and
UnavailableException the most likely candidates.

I'm currently leaning towards the latter option.
{quote}

Let's use UnavailableException since WriteFailureException indicates a
non-timeout failure when processing a mutation, and so it is not appropriate
for this case. For protocol V4 we cannot change UnavailableException, but for
V5 we should add a new parameter to it. At the moment it contains
{{}}, we should add the number of overloaded replicas, so
that drivers can treat the
two cases differently. Another alternative, as suggested by [~slebresne], is to
simply consider overloaded replicas as dead and hint them, therefore throwing
unavailable exceptions as usual, but this is slightly less accurate then
letting clients know that some replicas were unavailable and some simply
overloaded.

bq. We only need to ensure the coordinator for that specific mutation has
back-pressure enabled, and we could do this by "marking" the MessageOut with a
special parameter, what do you think?

Marking messages as throttled would let the replica know if backpressure was
enabled, that's true, but it also makes the existing mechanism even more
complex. Also, as far as I understand it, dropping mutations that have been in
the queue for longer that the RPC write timeout is done not only to shed load
on the replica, but also to avoid wasting resources to perform a mutation when
the coordinator has already returned a timeout exception to the client. I think
this still holds true regardless of backpressure. Since we cannot remove a
timeout check in the write response handlers, I don't see how it helps to drop
it replica side. If the message was throttled, even with cross_node_timeout
enabled, the replica should have time to process it before the RPC write
timeout expires, so I don't think the extra complexity is justified.

bq. If you all agree with that, I'll move forward and make that change.

To summarize, I agree with this change, provided the drivers can separate the
two cases (node unavailable vs. node overloaded), which they will be able to do
with V5 of the native protocol. The alternative, would be to simply consider
overloaded replicas as dead and hint them. Further, I still have concerns
regarding additional write timeout exceptions and whether an overloaded or slow
replica can slow everything down. [~slebresne], [~jbellis] anything else from
your side? I think Jonathan's proposal of bounding total outstanding requests
to all replicas, is

[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-07-11 Thread Michael Shuler (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler updated CASSANDRA-9754:
--
Tester:   (was: Michael Shuler)

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9608) Support Java 9

2016-07-11 Thread Carlos Abad (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371835#comment-15371835
 ] 

Carlos Abad commented on CASSANDRA-9608:


Carlos Abad here from the Intel Java team.

I've been able to build Cassandra with JDK9 (build 125) applying the 
modifications noted below. 
However the build only pass 70% of the Unit tests.

*At least Apache Ant version 1.9.7 is required for JDK9. Previous version are 
unable to load the JavaScript script engine manager in JDK9.

build.xml:
*Generating Bytecode for Java 9. Source is still written for Java 8. 
*Java 9 removed -Xbootclasspath/p command option. Without this option Cassandra 
will depend on the given Java classpath to include the CRC32 class
*To avoid "Annotation generator had thrown the exception. 
java.lang.NoClassDefFoundError: javax/annotation/Generated" need to add 
"-addmods java.annotations.common" to the javac task of the "build-test" target.

src/java/org/apache/cassandra/utils/Throwables.java:
*Stream.of() throws an exception now, need to be captured.

src/java/org/apache/cassandra/utils/concurrent/Locks.java:
*sun.misc.unsafe is going away. Fortunately cassandra.utils.concurrent.Locks is 
only used in one place (db/partitions/AtomicBTreePartition, see below). It'll 
be enough to modify AtomicBTreePartitions and remove this class.

src/java/org/apache/cassandra/db/partitions/AtomicBTreePartition.java:
*This is the only place where the class cassandra.utils.concurrent.Locks is 
used. We'll modify it to use Java's ReentrantLock instead.

> Support Java 9
> --
>
> Key: CASSANDRA-9608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9608
> Project: Cassandra
>  Issue Type: Task
>Reporter: Robert Stupp
>Priority: Minor
>
> This ticket is intended to group all issues found to support Java 9 in the 
> future.
> From what I've found out so far:
> * Maven dependency {{com.sun:tools:jar:0}} via cobertura cannot be resolved. 
> It can be easily solved using this patch:
> {code}
> - artifactId="cobertura"/>
> + artifactId="cobertura">
> +  
> +
> {code}
> * Another issue is that {{sun.misc.Unsafe}} no longer contains the methods 
> {{monitorEnter}} + {{monitorExit}}. These methods are used by 
> {{o.a.c.utils.concurrent.Locks}} which is only used by 
> {{o.a.c.db.AtomicBTreeColumns}}.
> I don't mind to start working on this yet since Java 9 is in a too early 
> development phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12149) NullPointerException on SELECT with SASI index

2016-07-11 Thread Andrey Konstantinov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371786#comment-15371786
 ] 

Andrey Konstantinov commented on CASSANDRA-12149:
-

Thanks! Yes, this is to fetch rows sequentially by one machine. My goal was to 
fetch half of a partition by one machine and another half by the second 
machine. It seems it is impossible to do this without knowing split clustering 
key value.

> NullPointerException on SELECT with SASI index
> --
>
> Key: CASSANDRA-12149
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12149
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Andrey Konstantinov
> Attachments: CASSANDRA-12149.txt
>
>
> If I execute the sequence of queries (see the attached file), Cassandra 
> aborts a connection reporting NPE on server side. SELECT query without token 
> range filter works, but does not work when token range filter is specified. 
> My intent was to issue multiple SELECT queries targeting the same single 
> partition, filtered by a column indexed by SASI, partitioning results by 
> different token ranges.
> Output from cqlsh on SELECT is the following:
> cqlsh> SELECT namespace, entity, timestamp, feature1, feature2 FROM 
> mykeyspace.myrecordtable WHERE namespace = 'ns2' AND entity = 'entity2' AND 
> feature1 > 11 AND feature1 < 31  AND token(namespace, entity) <= 
> 9223372036854775807;
> ServerError:  message="java.lang.NullPointerException">



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

2016-07-11 Thread Michael Shuler (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler updated CASSANDRA-9754:
--
Tester: Michael Shuler

> Make index info heap friendly for large CQL partitions
> --
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Michael Kjellman
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 168 matches

Mail list logo