[jira] [Updated] (CASSANDRA-11272) NullPointerException (NPE) during bootstrap startup in StorageService.java

2016-05-19 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11272:
--
Fix Version/s: (was: 3.7)
   3.x

> NullPointerException (NPE) during bootstrap startup in StorageService.java
> --
>
> Key: CASSANDRA-11272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11272
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
> Environment: debian jesse up to date
>Reporter: Jason Kania
>Assignee: Alex Petrov
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> After bootstrapping fails due to stream closed error, the following error 
> results:
> {code}
> Feb 27, 2016 8:06:38 PM com.google.common.util.concurrent.ExecutionList 
> executeListener
> SEVERE: RuntimeException while executing runnable 
> com.google.common.util.concurrent.Futures$6@3d61813b with executor INSTANCE
> java.lang.NullPointerException
> at 
> org.apache.cassandra.service.StorageService$2.onFailure(StorageService.java:1284)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
> at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
> at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
> at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
> at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
> at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:525)
> at 
> org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:645)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:70)
> at 
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-11731:
-
Assignee: Philip Thompson  (was: Stefania)

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292727#comment-15292727
 ] 

Stefania commented on CASSANDRA-11731:
--

CI results are good, assigning back to you for further testing 
[~philipthompson].

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Stefania
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11851) Table alias not supported

2016-05-19 Thread Prajakta Bhosale (JIRA)
Prajakta Bhosale created CASSANDRA-11851:


 Summary: Table alias not supported 
 Key: CASSANDRA-11851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11851
 Project: Cassandra
  Issue Type: Bug
  Components: CQL
 Environment: [cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift 
protocol 19.39.0]

Reporter: Prajakta Bhosale
Priority: Minor


Table alias not supported in CQL ... Getting below error message while 
accessing it ... 
cqlsh:test>select e.emp_id from emp e;
Bad Request: line 1:25 no viable alternative at input 'e'
Same query is working with w/o alias name as well as with column alias

Below are version details  : 
show version
[cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

2016-05-19 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292649#comment-15292649
 ] 

Yuki Morishita commented on CASSANDRA-11750:


you are right. Here is 3.0 version.

||branch||testall||dtest||
|[11750-3.0|https://github.com/yukim/cassandra/tree/11750-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11750-3.0-testall/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/yukim/job/yukim-11750-3.0-dtest/lastCompletedBuild/testReport/]|


> Offline scrub should not abort when it hits corruption
> --
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Hattrell
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: Tools
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

2016-05-19 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-11750:
---
Fix Version/s: 3.0.x

> Offline scrub should not abort when it hits corruption
> --
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Hattrell
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: Tools
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11569) Track message latency across DCs

2016-05-19 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292648#comment-15292648
 ] 

Chris Lohfink commented on CASSANDRA-11569:
---

You just described why averages are a terrible statistic to track latencies on. 
This metric is of "all time" so if theres suddenly a spike in latency the 
average wont suddenly change since it is averaged with all the previous data. 
See CASSANDRA-11752

> Track message latency across DCs
> 
>
> Key: CASSANDRA-11569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11569
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: CASSANDRA-11569.patch, CASSANDRA-11569v2.txt, 
> nodeLatency.PNG
>
>
> Since we have the timestamp a message is created and when arrives, we can get 
> an approximate time it took relatively easy and would remove necessity for 
> more complex hacks to determine latency between DCs.
> Although is not going to be very meaningful when ntp is not setup, it is 
> pretty common to have NTP setup and even with clock drift nothing is really 
> hurt except the metric becoming whacky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292613#comment-15292613
 ] 

Stefania commented on CASSANDRA-11731:
--

I've created a dtest patch 
[here|https://github.com/stef1927/cassandra-dtest/commits/11731] and a c* patch 
for trunk [here|https://github.com/stef1927/cassandra/commits/11731]. If the 
tests are fine we will need to backport it to 2.2.

I've started a run of CI:

|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11731-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11731-dtest/]|

[~philipthompson] if the CI results are OK can you start another batch of 
repeated tests for the entire {{TestPushedNotifications}} class? We should also 
consider adding one more test that checks that when a node joins, all other 
nodes send a NEW_NODE and that when one node leaves, all other nodes send 
NODE_LEFT. See comment 
[here|https://github.com/stef1927/cassandra-dtest/commit/bae01dee9bd399981799c8d17ac671af0ca964e2#diff-2e73564535f1538fb660a5df5635f887R97]
 for more details. I've also lowered some timeouts since they should be 
sufficient now that we have changed when NEW_NODE is sent, I hope they are not 
too low though.

[~beobal] this patch should also cover CASSANDRA-11038.

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Stefania
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11850) cannot use cql since upgrading python to 2.7.11+

2016-05-19 Thread Andrew Madison (JIRA)
Andrew Madison created CASSANDRA-11850:
--

 Summary: cannot use cql since upgrading python to 2.7.11+
 Key: CASSANDRA-11850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11850
 Project: Cassandra
  Issue Type: Bug
  Components: CQL
 Environment: Development
Reporter: Andrew Madison
 Fix For: 3.5


OS: Debian GNU/Linux stretch/sid 
Kernel: 4.5.0-2-amd64 #1 SMP Debian 4.5.4-1 (2016-05-16) x86_64 GNU/Linux
Python version: 2.7.11+ (default, May  9 2016, 15:54:33)
[GCC 5.3.1 20160429]

cqlsh --version: cqlsh 5.0.1
cassandra -v: 3.5 (also occurs with 3.0.6)

Issue:
when running cqlsh, it returns the following error:

cqlsh -u dbarpt_usr01
Password: *

Connection error: ('Unable to connect to any servers', {'odbasandbox1': 
TypeError('ref() does not take keyword arguments',)})

I cleared PYTHONPATH:

python -c "import json; print dir(json); print json.__version__"
['JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', 
'__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', 
'_default_decoder', '_default_encoder', 'decoder', 'dump', 'dumps', 'encoder', 
'load', 'loads', 'scanner']
2.0.9

Java based clients can connect to Cassandra with no issue. Just CQLSH and 
Python clients cannot.

nodetool status also works.

Thank you for your help.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292477#comment-15292477
 ] 

Stefania commented on CASSANDRA-11731:
--

I'm going to take a look server side. I know there is an erroneous NEW_NODE 
when a node restarts (CASSANDRA-11038) but that should be unrelated to 
MOVE_NODE.

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania reassigned CASSANDRA-11731:


Assignee: Stefania  (was: Philip Thompson)

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Stefania
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292449#comment-15292449
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jkni] Thanks a lot for looking at this!
1. the jstack is token from the nodes did not have gossip disabled, and I 
attached the full jstack.
2. I will send the logs to your email.
3. The latency increased about several minutes after I re-enable the gossips. 
It could not recover by itself. I fixed it by rolling restart the cluster.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> 

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292423#comment-15292423
 ] 

Jeremiah Jordan commented on CASSANDRA-11709:
-

It falls back so you can do a rolling upgrade from PFS to GPFS.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 

[jira] [Updated] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-11709:
--
Attachment: lock.jstack

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 tid=0x7fde90d91000 nid=0x50e9 waiting for 
> monitor entry [0x7fde78bbc000]
>

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292415#comment-15292415
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jjordan], yes, it's definite possible. I'm wondering what the reason that 
GPFS would fall back to the PFS?

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 

[jira] [Issue Comment Deleted] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

2016-05-19 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11742:
--
Comment: was deleted

(was: I think this second patch is an improvement - I traced this issue to 
determine exactly why it worked on 2.1. This behavior was introduced by 
[CASSANDRA-8049] which centralized Cassandra startup checks. Prior to this 
change, we inserted cluster name directly after checking the health of the 
system keyspace, so if an sstable for the system keyspace was flushed, we could 
guarantee that some sstable contained cluster name. After [CASSANDRA-8049], we 
insert cluster name with the rest of the local metadata in 
{{SystemKeyspace.finishStartup()}}.

[~beobal] - I couldn't find a reason for the change as to when cluster name is 
inserted other than that it didn't seem like a good idea to mutate anything in 
a startup check. Can you think of any reason we can't just call 
{{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the 
system keyspace in {{CassandraDaemon}}? The root cause of this problem is that 
we need the data persisted before any truncate/schema logic, since these will 
write to the system keyspace, so we can have flushed sstables with this data 
but no sstable with cluster name, which breaks the logic of the system keyspace 
health check. I ran full unit tests/dtests on a branch that moved 
{{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of 
the system keyspace and the results looked good.)

> Failed bootstrap results in exception when node is restarted
> 
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

2016-05-19 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292309#comment-15292309
 ] 

Joel Knighton commented on CASSANDRA-11742:
---

I think this second patch is an improvement - I traced this issue to determine 
exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] 
which centralized Cassandra startup checks. Prior to this change, we inserted 
cluster name directly after checking the health of the system keyspace, so if 
an sstable for the system keyspace was flushed, we could guarantee that some 
sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name 
with the rest of the local metadata in {{SystemKeyspace.finishStartup}}.

[~beobal] - I couldn't find a reason for the change as to when cluster name is 
inserted other than that it didn't seem like a good idea to mutate anything in 
a startup check. Can you think of any reason we can't just call 
{{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the 
system keyspace in {{CassandraDaemon}}? The root cause of this problem is that 
we need the data persisted before any truncate/schema logic, since these will 
write to the system keyspace, so we can have flushed sstables with this data 
but no sstable with cluster name, which breaks the logic of the system keyspace 
health check. I ran full unit tests/dtests on a branch that moved 
{{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of 
the system keyspace and the results looked good.

> Failed bootstrap results in exception when node is restarted
> 
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11742) Failed bootstrap results in exception when node is restarted

2016-05-19 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292308#comment-15292308
 ] 

Joel Knighton commented on CASSANDRA-11742:
---

I think this second patch is an improvement - I traced this issue to determine 
exactly why it worked on 2.1. This behavior was introduced by [CASSANDRA-8049] 
which centralized Cassandra startup checks. Prior to this change, we inserted 
cluster name directly after checking the health of the system keyspace, so if 
an sstable for the system keyspace was flushed, we could guarantee that some 
sstable contained cluster name. After [CASSANDRA-8049], we insert cluster name 
with the rest of the local metadata in {{SystemKeyspace.finishStartup()}}.

[~beobal] - I couldn't find a reason for the change as to when cluster name is 
inserted other than that it didn't seem like a good idea to mutate anything in 
a startup check. Can you think of any reason we can't just call 
{{SystemKeyspace.persistLocalMetadata}} immediately after snapshotting the 
system keyspace in {{CassandraDaemon}}? The root cause of this problem is that 
we need the data persisted before any truncate/schema logic, since these will 
write to the system keyspace, so we can have flushed sstables with this data 
but no sstable with cluster name, which breaks the logic of the system keyspace 
health check. I ran full unit tests/dtests on a branch that moved 
{{SystemKeyspace.persistLocalMetadata}} to immediately after the snapshot of 
the system keyspace and the results looked good.

> Failed bootstrap results in exception when node is restarted
> 
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
> Attachments: 11742-2.txt, 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11719) Add bind variables to trace

2016-05-19 Thread Mahdi Mohammadi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292237#comment-15292237
 ] 

Mahdi Mohammadi commented on CASSANDRA-11719:
-

[~snazy] The test file TraceCqlTest exists only on trunk branch. Should I 
create my branch off trunk?

> Add bind variables to trace
> ---
>
> Key: CASSANDRA-11719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11719
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Mahdi Mohammadi
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 11719-2.1.patch
>
>
> {{org.apache.cassandra.transport.messages.ExecuteMessage#execute}} mentions a 
> _TODO_ saying "we don't have [typed] access to CQL bind variables here".
> In fact, we now have access typed access to CQL bind variables there. So, it 
> is now possible to show the bind variables in the trace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

2016-05-19 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292221#comment-15292221
 ] 

Jeremiah Jordan edited comment on CASSANDRA-11750 at 5/19/16 9:53 PM:
--

[~yukim] is the a reason for not putting this in 3.0 as well?  Seems strange to 
not merge the change all the way forward and only have it in 2.1/2.2/3.8?


was (Author: jjordan):
[~yukim] is the a reason for not putting this in 3.0 as well?  Seems strange to 
not merge the change all the way forward and have it in 2.1/2.2/3.8?

> Offline scrub should not abort when it hits corruption
> --
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Hattrell
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: Tools
> Fix For: 2.1.x, 2.2.x
>
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

2016-05-19 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292221#comment-15292221
 ] 

Jeremiah Jordan commented on CASSANDRA-11750:
-

[~yukim] is the a reason for not putting this in 3.0 as well?  Seems strange to 
not merge the change all the way forward and have it in 2.1/2.2/3.8?

> Offline scrub should not abort when it hits corruption
> --
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Hattrell
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: Tools
> Fix For: 2.1.x, 2.2.x
>
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292145#comment-15292145
 ] 

Paulo Motta commented on CASSANDRA-11845:
-

Unfortunately it's not possible to track down the cause from these logs your 
posted. You'll need to [enable DEBUG 
logging|https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html]
 on the {{org.apache.cassandra.streaming}} and {{org.apache.cassandra.repair}} 
packages and attach full debug.log on this ticket (you should use the attach 
files functionality of JIRA instead of pasting logs on the comments).

Please note that to cancel hanged repair you'll probably need to restart 
involved nodes first before starting a new repair (stop repair functionality 
will be provided by CASSANDRA-3486).

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db

[jira] [Updated] (CASSANDRA-11849) Potential data directory problems due to CFS getDirectories logic

2016-05-19 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11849:
---
Description: 
CASSANDRA-8671 added the ability to change the data directory based on the 
compaction strategy.  

Since nothing uses this yet we haven't hit any issues but reading the code I 
see potential bugs for things like Transaction log cleanup and CFS 
initialization since these all use the default {{Directories}} location from 
the yaml.

* {{Directories}} is passed into CFS constructor then possibly disregarded.
* Startup checks like scrubDataDirectories are all using default Directories 
locations.
* StandaloneSSTableUtil 

  was:
CASSANDRA-8671 added the ability to change the data directory based on the 
compaction strategy.  

Since nothing uses this yet we haven't hit any issues but reading the code I 
see potential bugs for things like Transaction log cleanup and CFA 
initialization since these all use the default {{Directories}} location from 
the yaml.

* {{Directories}} is passed into CFS constructor then possibly disregarded.
* Startup checks like scrubDataDirectories are all using default Directories 
locations.
* StandaloneSSTableUtil 


> Potential data directory problems due to CFS getDirectories logic
> -
>
> Key: CASSANDRA-11849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Blake Eggleston
>
> CASSANDRA-8671 added the ability to change the data directory based on the 
> compaction strategy.  
> Since nothing uses this yet we haven't hit any issues but reading the code I 
> see potential bugs for things like Transaction log cleanup and CFS 
> initialization since these all use the default {{Directories}} location from 
> the yaml.
> * {{Directories}} is passed into CFS constructor then possibly disregarded.
> * Startup checks like scrubDataDirectories are all using default Directories 
> locations.
> * StandaloneSSTableUtil 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11849) Potential data directory problems due to CFS getDirectories logic

2016-05-19 Thread T Jake Luciani (JIRA)
T Jake Luciani created CASSANDRA-11849:
--

 Summary: Potential data directory problems due to CFS 
getDirectories logic
 Key: CASSANDRA-11849
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11849
 Project: Cassandra
  Issue Type: Bug
Reporter: T Jake Luciani
Assignee: Blake Eggleston


CASSANDRA-8671 added the ability to change the data directory based on the 
compaction strategy.  

Since nothing uses this yet we haven't hit any issues but reading the code I 
see potential bugs for things like Transaction log cleanup and CFA 
initialization since these all use the default {{Directories}} location from 
the yaml.

* {{Directories}} is passed into CFS constructor then possibly disregarded.
* Startup checks like scrubDataDirectories are all using default Directories 
locations.
* StandaloneSSTableUtil 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything

2016-05-19 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11848:

Assignee: Paulo Motta

> replace address can "succeed" without actually streaming anything
> -
>
> Key: CASSANDRA-11848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11848
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jeremiah Jordan
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> When you do a replace address and the new node has the same IP as the node it 
> is replacing, then the following check can let the replace be successful even 
> if we think all the other nodes are down: 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271
> As the FailureDetectorSourceFilter will exclude the other nodes, so an empty 
> stream plan gets executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything

2016-05-19 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-11848:

Description: 
When you do a replace address and the new node has the same IP as the node it 
is replacing, then the following check can let the replace be successful even 
if we think all the other nodes are down: 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271

As the FailureDetectorSourceFilter will exclude the other nodes, so an empty 
stream plan gets executed.

  was:When you do a replace address and the new node has the same IP as the 
node it is replacing, then the following check can let the replace be 
successful even if we think all the other nodes are down: 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271


> replace address can "succeed" without actually streaming anything
> -
>
> Key: CASSANDRA-11848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11848
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jeremiah Jordan
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> When you do a replace address and the new node has the same IP as the node it 
> is replacing, then the following check can let the replace be successful even 
> if we think all the other nodes are down: 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271
> As the FailureDetectorSourceFilter will exclude the other nodes, so an empty 
> stream plan gets executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything

2016-05-19 Thread Jeremiah Jordan (JIRA)
Jeremiah Jordan created CASSANDRA-11848:
---

 Summary: replace address can "succeed" without actually streaming 
anything
 Key: CASSANDRA-11848
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11848
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Jeremiah Jordan
 Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x


When you do a replace address and the new node has the same IP as the node it 
is replacing, then the following check can let the replace be successful even 
if we think all the other nodes are down: 
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/RangeStreamer.java#L271



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:13 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-3
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-3
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-3
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2
and /Node-1 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a
nd /Node-1 on 

[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:12 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-3
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-3
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-3
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2
and /Node-1 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a
nd /Node-1 on 

[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-19 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291965#comment-15291965
 ] 

Jeff Jirsa commented on CASSANDRA-11847:


It definitely looks a lot like a hardware problem, but even if it weren't 
Cassandra 2.0 isn't supported anymore. Not even critical fixes. You'd need to 
re-open if you can replicate the problem in 2.1+





> Cassandra dies on a specific node in a multi-DC environment
> ---
>
> Key: CASSANDRA-11847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Core
> Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
>Reporter: Rajesh Babu
> Attachments: java_error19030.log, java_error2912.log, 
> java_error4571.log, java_error7539.log, java_error9552.log
>
>
> We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
> Cassandra pid dies randomly but on a specific node.
> Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
> node.
> I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 
> (patch 79) but it still seems to be an issue. 
> Is this a known hardware related bug or should is this issue fixed in later 
> Cassandra versions? 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libjava.so+0xe027f]  _fini+0xbd5f7
> #
> # Core dump written. Default location: /tmp/core or core.19030
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
> [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f4542d5a27f
> Registers:
> RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
> RDX=0x0020
> RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
> RDI=0x0001
> R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
> R11=0x0006fae57068
> R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
> R15=0x7f453c89f000
> RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0014
>   TRAPNO=0x000e
> -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  0x7f28e08787a4
> #
> # Core dump written. Default location: /tmp/core or core.2912
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f2640008000):  JavaThread "ValidationExecutor:15" 
> daemon [_thread_in_Java, id=7393, 
> stack(0x7f256fdf8000,0x7f256fe39000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f28e08787a4
> Registers:
> RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, 
> RDX=0xc770409a
> RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, 
> RDI=0x
> R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, 
> R11=0x3f8bb878
> R12=0x, R13=0x00063b820530, R14=0x000b, 
> R15=0x7f2640008000
> RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0015
>   TRAPNO=0x000e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:12 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-3
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-3
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-3
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2
and /Node-1 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a
nd /Node-1 on 

[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:11 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-1
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-1
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-1
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /Node-2
and /192.168.200.151 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between /Node-2 a
nd 

[jira] [Comment Edited] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test

2016-05-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291224#comment-15291224
 ] 

Philip Thompson edited comment on CASSANDRA-11760 at 5/19/16 7:04 PM:
--

I'm re-running the tests that found this, to see if it comes up again. They 
take about 3-4 hours. EDIT: Re-re-running the tests. They ran against the old 
sha, not the one with the fix from 11613.


was (Author: philipthompson):
I'm re-running the tests that found this, to see if it comes up again. They 
take about 3-4 hours.

> dtest failure in 
> TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
> 
>
> Key: CASSANDRA-11760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: dtest
> Fix For: 3.6
>
> Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, 
> node3.log, node3_debug.log
>
>
> example failure:
> http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/
> I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant 
> failure stack trace extracted here:
> {code}
> ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33
> 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe
> rvice-Incoming-/127.0.0.1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291946#comment-15291946
 ] 

Philip Thompson commented on CASSANDRA-11731:
-

So, it still fails: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/105/testReport/
 just less frequently. I'm working off this branch: 
https://github.com/riptano/cassandra-dtest/tree/fix-11731 I think the reduced 
flake rate is just from the longer waiting, but this didn't fix the root issue.

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 7:02 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-1
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-1
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-1
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 
and /192.168.200.151 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between 

[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 6:59 PM:


Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-1
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-1
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-1
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep - iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 
and /192.168.200.151 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between 

[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291939#comment-15291939
 ] 

vin01 commented on CASSANDRA-11845:
---

Yeah, its still stuck at 55 % . No new streams are getting created, netstats 
shows the same output again n again. Only thing that changes in its output is :-

Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Here is a longer snippet of netstats output which shows the repair session as 
well, it has been the same for last 8 or so hrs :-

Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-1
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db
 326558/326558 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db
 1484827/1484827 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db
 393636/393636 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db
 825459/825459 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db
 3568782/3568782 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db
 271222/271222 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db
 4315497/4315497 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db
 19775/19775 bytes(100%) received from idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db
 355293/355293 bytes(100%) received from idx:0/Node-1
Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 
bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 1796825/1796825 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 4549996/4549996 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 1658881/1658881 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 1418335/1418335 bytes(100%) sent to idx:0/Node-1

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 20064/20064 bytes(100%) sent to idx:0/Node-1
Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14760878
Gossip messages n/a 0 151698

Snippet for system.log using grep -iE "repair|valid|sync" system.log :-

INFO  [StreamReceiveTask:479] 2016-05-19 05:53:27,539 LocalSyncTask.java:114 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5b7f3-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 
and /192.168.200.151 on TABLE_NAME
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,540 RepairJob.java:152 - [repair 
#a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,541 RepairSession.java:279 - 
[repair #a0e5b7f3-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:27,542 RepairRunnable.java:232 - 
Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range 
(-4182952858113330342,-4157904914928848809] finished
INFO  [StreamReceiveTask:59] 2016-05-19 05:53:41,124 LocalSyncTask.java:114 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Sync complete using session 
a0e5df00-1d99-11e6-9d63-b717b380ffdd between /192.168.100.138 a
nd /192.168.200.151 on TABLE_NAME
INFO  

[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-19 Thread Rajesh Babu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291882#comment-15291882
 ] 

Rajesh Babu commented on CASSANDRA-11847:
-

It is a physical hardware (private cloud)

Manufacturer: Quanta Computer Inc
Product Name: QuantaPlex T41S-2U

I indeed thought initially it was a RAM related issue and I swapped the RAM on 
that node with "SAMSUNG 16GB 288-Pin DDR4 SDRAM ECC Registered DDR4 2133 (PC4 
17000) Server Memory Model M393A2G40DB0-CPB" but that didn't help either. 
Server was stable for 3 days or so and then again Cassandra died.

I just wanted to see if this issue is caused by Cassandra software (may be 
fixed in later versions, may be 2.0.17?)


> Cassandra dies on a specific node in a multi-DC environment
> ---
>
> Key: CASSANDRA-11847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Core
> Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
>Reporter: Rajesh Babu
> Attachments: java_error19030.log, java_error2912.log, 
> java_error4571.log, java_error7539.log, java_error9552.log
>
>
> We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
> Cassandra pid dies randomly but on a specific node.
> Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
> node.
> I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 
> (patch 79) but it still seems to be an issue. 
> Is this a known hardware related bug or should is this issue fixed in later 
> Cassandra versions? 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libjava.so+0xe027f]  _fini+0xbd5f7
> #
> # Core dump written. Default location: /tmp/core or core.19030
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
> [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f4542d5a27f
> Registers:
> RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
> RDX=0x0020
> RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
> RDI=0x0001
> R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
> R11=0x0006fae57068
> R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
> R15=0x7f453c89f000
> RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0014
>   TRAPNO=0x000e
> -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  0x7f28e08787a4
> #
> # Core dump written. Default location: /tmp/core or core.2912
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f2640008000):  JavaThread "ValidationExecutor:15" 
> daemon [_thread_in_Java, id=7393, 
> stack(0x7f256fdf8000,0x7f256fe39000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f28e08787a4
> Registers:
> RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, 
> RDX=0xc770409a
> RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, 
> RDI=0x
> R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, 
> R11=0x3f8bb878
> R12=0x, R13=0x00063b820530, R14=0x000b, 
> R15=0x7f2640008000
> RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0015
>   TRAPNO=0x000e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8844) Change Data Capture (CDC)

2016-05-19 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-8844:
---
Status: Patch Available  (was: Open)

Setting back to Patch Available.

There is now an implemented solution for the size tracking problems listed 
above. The branch is post-rebase of the addition of lower/upper bound to 
segments, and tests are in a mostly complete place.

Have 3 failed dtests and 7 failed in testall that I believe are unrelated 
(read: flakey) but I'm going to track down each locally to confirm.

I've fixed the CreateTest and CommitLogStressTest since the last CI run. No 
sense in paying for another run until I've confirmed these final 10 tests 
aren't a problem from the branch.

> Change Data Capture (CDC)
> -
>
> Key: CASSANDRA-8844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8844
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination, Local Write-Read Paths
>Reporter: Tupshin Harper
>Assignee: Joshua McKenzie
>Priority: Critical
> Fix For: 3.x
>
>
> "In databases, change data capture (CDC) is a set of software design patterns 
> used to determine (and track) the data that has changed so that action can be 
> taken using the changed data. Also, Change data capture (CDC) is an approach 
> to data integration that is based on the identification, capture and delivery 
> of the changes made to enterprise data sources."
> -Wikipedia
> As Cassandra is increasingly being used as the Source of Record (SoR) for 
> mission critical data in large enterprises, it is increasingly being called 
> upon to act as the central hub of traffic and data flow to other systems. In 
> order to try to address the general need, we (cc [~brianmhess]), propose 
> implementing a simple data logging mechanism to enable per-table CDC patterns.
> h2. The goals:
> # Use CQL as the primary ingestion mechanism, in order to leverage its 
> Consistency Level semantics, and in order to treat it as the single 
> reliable/durable SoR for the data.
> # To provide a mechanism for implementing good and reliable 
> (deliver-at-least-once with possible mechanisms for deliver-exactly-once ) 
> continuous semi-realtime feeds of mutations going into a Cassandra cluster.
> # To eliminate the developmental and operational burden of users so that they 
> don't have to do dual writes to other systems.
> # For users that are currently doing batch export from a Cassandra system, 
> give them the opportunity to make that realtime with a minimum of coding.
> h2. The mechanism:
> We propose a durable logging mechanism that functions similar to a commitlog, 
> with the following nuances:
> - Takes place on every node, not just the coordinator, so RF number of copies 
> are logged.
> - Separate log per table.
> - Per-table configuration. Only tables that are specified as CDC_LOG would do 
> any logging.
> - Per DC. We are trying to keep the complexity to a minimum to make this an 
> easy enhancement, but most likely use cases would prefer to only implement 
> CDC logging in one (or a subset) of the DCs that are being replicated to
> - In the critical path of ConsistencyLevel acknowledgment. Just as with the 
> commitlog, failure to write to the CDC log should fail that node's write. If 
> that means the requested consistency level was not met, then clients *should* 
> experience UnavailableExceptions.
> - Be written in a Row-centric manner such that it is easy for consumers to 
> reconstitute rows atomically.
> - Written in a simple format designed to be consumed *directly* by daemons 
> written in non JVM languages
> h2. Nice-to-haves
> I strongly suspect that the following features will be asked for, but I also 
> believe that they can be deferred for a subsequent release, and to guage 
> actual interest.
> - Multiple logs per table. This would make it easy to have multiple 
> "subscribers" to a single table's changes. A workaround would be to create a 
> forking daemon listener, but that's not a great answer.
> - Log filtering. Being able to apply filters, including UDF-based filters 
> would make Casandra a much more versatile feeder into other systems, and 
> again, reduce complexity that would otherwise need to be built into the 
> daemons.
> h2. Format and Consumption
> - Cassandra would only write to the CDC log, and never delete from it. 
> - Cleaning up consumed logfiles would be the client daemon's responibility
> - Logfile size should probably be configurable.
> - Logfiles should be named with a predictable naming schema, making it 
> triivial to process them in order.
> - Daemons should be able to checkpoint their work, and resume from where they 
> left off. This means they would have to leave some file artifact in the CDC 
> log's directory.
> - A 

[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-19 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291694#comment-15291694
 ] 

Jeff Jirsa commented on CASSANDRA-11847:


Your crashes are all over cassandra (commitlog, mutation, compaction) - the 
most likely cause is bad hardware (bad memory, for example).

Physical hardware / home grown VM or public cloud? ECC RAM? 



> Cassandra dies on a specific node in a multi-DC environment
> ---
>
> Key: CASSANDRA-11847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Core
> Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
>Reporter: Rajesh Babu
> Attachments: java_error19030.log, java_error2912.log, 
> java_error4571.log, java_error7539.log, java_error9552.log
>
>
> We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
> Cassandra pid dies randomly but on a specific node.
> Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
> node.
> I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 
> (patch 79) but it still seems to be an issue. 
> Is this a known hardware related bug or should is this issue fixed in later 
> Cassandra versions? 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libjava.so+0xe027f]  _fini+0xbd5f7
> #
> # Core dump written. Default location: /tmp/core or core.19030
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
> [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f4542d5a27f
> Registers:
> RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
> RDX=0x0020
> RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
> RDI=0x0001
> R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
> R11=0x0006fae57068
> R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
> R15=0x7f453c89f000
> RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0014
>   TRAPNO=0x000e
> -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  0x7f28e08787a4
> #
> # Core dump written. Default location: /tmp/core or core.2912
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f2640008000):  JavaThread "ValidationExecutor:15" 
> daemon [_thread_in_Java, id=7393, 
> stack(0x7f256fdf8000,0x7f256fe39000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f28e08787a4
> Registers:
> RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, 
> RDX=0xc770409a
> RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, 
> RDI=0x
> R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, 
> R11=0x3f8bb878
> R12=0x, R13=0x00063b820530, R14=0x000b, 
> R15=0x7f2640008000
> RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0015
>   TRAPNO=0x000e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-19 Thread Rajesh Babu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291687#comment-15291687
 ] 

Rajesh Babu commented on CASSANDRA-11847:
-

Cassandra system log indicates the below, before Cassandra process id dies

 INFO [CompactionExecutor:49] 2016-05-10 14:06:37,074 CompactionTask.java (line 
115) Compacting 
[SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-266-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-267-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-268-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-265-Data.db')]
 INFO [CompactionExecutor:49] 2016-05-10 14:06:37,191 CompactionTask.java (line 
287) Compacted 4 sstables to 
[/var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-269,].
  742,551 bytes to 256,142 (~34% of original) in 116ms = 2.105828MB/s.  7,348 
total partitions merged to 2,845.  Partition merge counts were {1:7348, }
 INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,693 ThriftServer.java 
(line 141) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,749 Server.java (line 
182) Stop listening for CQL clients
 INFO [StorageServiceShutdownHook] 2016-05-10 14:11:16,749 Gossiper.java (line 
1307) Announcing shutdown
 INFO [main] 2016-05-10 14:24:30,997 CassandraDaemon.java (line 135) Logging 
initialized
 INFO [main] 2016-05-10 14:24:31,028 YamlConfigurationLoader.java (line 80) 
Loading settings from 
file:/opt/cloudian-packages/apache-cassandra-2.0.11/conf/cassandra.yaml


> Cassandra dies on a specific node in a multi-DC environment
> ---
>
> Key: CASSANDRA-11847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Core
> Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
>Reporter: Rajesh Babu
> Attachments: java_error19030.log, java_error2912.log, 
> java_error4571.log, java_error7539.log, java_error9552.log
>
>
> We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
> Cassandra pid dies randomly but on a specific node.
> Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
> node.
> I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 
> (patch 79) but it still seems to be an issue. 
> Is this a known hardware related bug or should is this issue fixed in later 
> Cassandra versions? 
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libjava.so+0xe027f]  _fini+0xbd5f7
> #
> # Core dump written. Default location: /tmp/core or core.19030
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
> [_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
> si_addr=0x7f4542d5a27f
> Registers:
> RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
> RDX=0x0020
> RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
> RDI=0x0001
> R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
> R11=0x0006fae57068
> R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
> R15=0x7f453c89f000
> RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
> ERR=0x0014
>   TRAPNO=0x000e
> -
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  0x7f28e08787a4
> #
> # Core dump written. Default location: /tmp/core or core.2912
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread 

[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291686#comment-15291686
 ] 

Paulo Motta commented on CASSANDRA-11845:
-

[~vin01] so, {{nodetool netstats}} does no longer show ongoing stream sessions? 
is the repair still hanging at 55% or has it progressed?

If so, you'll probably need to attach your system.log for further 
investigation, since it's not possible to detect at which stage the repair is 
hanging from the data you provided so far. You may want to use grep to filter 
the log with {{grep -i 'repair\|valid\|sync' logs/system.log}}

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) sent to 

[jira] [Created] (CASSANDRA-11847) Cassandra dies on a specific node in a multi-DC environment

2016-05-19 Thread Rajesh Babu (JIRA)
Rajesh Babu created CASSANDRA-11847:
---

 Summary: Cassandra dies on a specific node in a multi-DC 
environment
 Key: CASSANDRA-11847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11847
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction, Core
 Environment: Cassandra 2.0.11, JDK build 1.7.0_79-b15
Reporter: Rajesh Babu
 Attachments: java_error19030.log, java_error2912.log, 
java_error4571.log, java_error7539.log, java_error9552.log

We've a customer who runs a 16 node 2 DC (8 nodes each) environment where 
Cassandra pid dies randomly but on a specific node.

Whenever Cassandra dies, admin has to manually restart Cassandra only on that 
node.

I tried upgrading their environment from java 1.7 (patch 60) to java 1.7 (patch 
79) but it still seems to be an issue. 


Is this a known hardware related bug or should is this issue fixed in later 
Cassandra versions? 

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f4542d5a27f, pid=19030, tid=139933154096896
#
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [libjava.so+0xe027f]  _fini+0xbd5f7
#
# Core dump written. Default location: /tmp/core or core.19030
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x7f453c89f000):  JavaThread "COMMIT-LOG-WRITER" 
[_thread_in_vm, id=19115, stack(0x7f44b9ed3000,0x7f44b9f14000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
si_addr=0x7f4542d5a27f

Registers:
RAX=0x, RBX=0x7f453c564ad0, RCX=0x0001, 
RDX=0x0020
RSP=0x7f44b9f125a0, RBP=0x7f44b9f125b0, RSI=0x, 
RDI=0x0001
R8 =0x7f453c564ad8, R9 =0x4aab, R10=0x7f453917a52c, 
R11=0x0006fae57068
R12=0x7f453c564ad8, R13=0x7f44b9f125d0, R14=0x, 
R15=0x7f453c89f000
RIP=0x7f4542d5a27f, EFLAGS=0x00010246, CSGSFS=0x0033, 
ERR=0x0014
  TRAPNO=0x000e


-

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f28e08787a4, pid=2912, tid=139798767699712
#
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  0x7f28e08787a4
#
# Core dump written. Default location: /tmp/core or core.2912
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x7f2640008000):  JavaThread "ValidationExecutor:15" daemon 
[_thread_in_Java, id=7393, stack(0x7f256fdf8000,0x7f256fe39000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=2 (SEGV_ACCERR), 
si_addr=0x7f28e08787a4

Registers:
RAX=0x, RBX=0x3f8bb878, RCX=0xc77040d6, 
RDX=0xc770409a
RSP=0x7f256fe37430, RBP=0x00063b820710, RSI=0x00063b820530, 
RDI=0x
R8 =0x3f8bb888, R9 =0x, R10=0x3f8bb888, 
R11=0x3f8bb878
R12=0x, R13=0x00063b820530, R14=0x000b, 
R15=0x7f2640008000
RIP=0x7f28e08787a4, EFLAGS=0x00010246, CSGSFS=0x0033, 
ERR=0x0015
  TRAPNO=0x000e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc

2016-05-19 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291633#comment-15291633
 ] 

Russ Hatch commented on CASSANDRA-11690:


looks to be the same as 11686

> dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
> ---
>
> Key: CASSANDRA-11690
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11690
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> looks like a possible resource constraint issue:
> {noformat}
> [Errno 12] Cannot allocate memory
> {noformat}
> more than one failure in recent history.
> http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc

2016-05-19 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch resolved CASSANDRA-11690.

Resolution: Duplicate

> dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
> ---
>
> Key: CASSANDRA-11690
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11690
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> looks like a possible resource constraint issue:
> {noformat}
> [Errno 12] Cannot allocate memory
> {noformat}
> more than one failure in recent history.
> http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11690) dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc

2016-05-19 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch reassigned CASSANDRA-11690:
--

Assignee: Russ Hatch  (was: DS Test Eng)

> dtest faile in test_rf_collapse_gossiping_property_file_snitch_multi_dc
> ---
>
> Key: CASSANDRA-11690
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11690
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> looks like a possible resource constraint issue:
> {noformat}
> [Errno 12] Cannot allocate memory
> {noformat}
> more than one failure in recent history.
> http://cassci.datastax.com/job/trunk_dtest/1173/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_collapse_gossiping_property_file_snitch_multi_dc/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291564#comment-15291564
 ] 

vin01 edited comment on CASSANDRA-11845 at 5/19/16 5:24 PM:


[-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats
pending tasks: 0

(output of compactionstats is same on all 3 nodes)

Its still stuck at same point.

nodetool netstats output summary :-

Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14758741
Gossip messages n/a 0 135056


was (Author: vin01):
[-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats
pending tasks: 0

Its still stuck at same point.

nodetool netstats output summary :-

Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14758741
Gossip messages n/a 0 135056

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> 

[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Olivier Michallat (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291578#comment-15291578
 ] 

Olivier Michallat commented on CASSANDRA-10786:
---

+1, it's an elegant way to avoid an extra roundtrip when the schema changed. 
And since it does affect the format of protocol messages, there's no risk of 
"forgetting" to cover it when implementing protocol v5, like [~adutra] 
suggested above.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291564#comment-15291564
 ] 

vin01 commented on CASSANDRA-11845:
---

[-]$ /mydir/apache-cassandra-2.2.4/bin/nodetool compactionstats
pending tasks: 0

Its still stuck at same point.

nodetool netstats output summary :-

Read Repair Statistics:
Attempted: 1142
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Large messages  n/a 0779
Small messages  n/a 0   14758741
Gossip messages n/a 0 135056

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> 

[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291536#comment-15291536
 ] 

Paulo Motta commented on CASSANDRA-11845:
-

[~vin01] can you check the output of {{nodetool compactionstats}} on the 
receiving node, and check if there are secondary indexes being rebuilt?

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
>  149066/149066 bytes(100%) sent to idx:0/Node-2
> 
> 

[jira] [Commented] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM

2016-05-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291485#comment-15291485
 ] 

Tyler Hobbs commented on CASSANDRA-11846:
-

This may be related to CASSANDRA-8779

> Invalid QueryBuilder.insert is not invalidated which causes OOM
> ---
>
> Key: CASSANDRA-11846
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11846
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra-2.1.14
>Reporter: ZhaoYang
>Priority: Minor
> Fix For: 2.1.15
>
>
> create table test(  key text primary key, value list );
> When using QueryBuilder.Insert() to bind column `value` with a blob, 
> Cassandra didn't consider it to be an invalid query and then lead to OOM and 
> crashed.
> the same plain query(String) can be invalidated by Cassandra and C* responds 
> InvalidQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11687) dtest failure in rebuild_test.TestRebuild.simple_rebuild_test

2016-05-19 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291459#comment-15291459
 ] 

Russ Hatch commented on CASSANDRA-11687:


1 test did fail out of 100 trials, so this test does need some repair.

> dtest failure in rebuild_test.TestRebuild.simple_rebuild_test
> -
>
> Key: CASSANDRA-11687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11687
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> single failure on most recent run (3.0 no-vnode)
> {noformat}
> concurrent rebuild should not be allowed, but one rebuild command should have 
> succeeded.
> {noformat}
> http://cassci.datastax.com/job/cassandra-3.0_novnode_dtest/217/testReport/rebuild_test/TestRebuild/simple_rebuild_test
> Failed on CassCI build cassandra-3.0_novnode_dtest #217



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291457#comment-15291457
 ] 

vin01 commented on CASSANDRA-11845:
---

Because of '-XX:+PerfDisableSharedMem' its not possible to use jstack or any 
similar tools i guess.
Also debug logging is not enabled.. so nothing in debug.log, i don't think log 
level can be changed at runtime..

And yes there are secondary indices in that table.

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
>  149066/149066 bytes(100%) sent to 

[jira] [Commented] (CASSANDRA-11686) dtest failure in replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc

2016-05-19 Thread Russ Hatch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291447#comment-15291447
 ] 

Russ Hatch commented on CASSANDRA-11686:


There were no failures when running a multiplex job on larger instances, so the 
problem does look like ccm node crowding, as I mentioned. I'll get these tests 
moved to the large test job.

> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_rf_expand_gossiping_property_file_snitch_multi_dc
> --
>
> Key: CASSANDRA-11686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11686
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Russ Hatch
>  Labels: dtest
>
> intermittent failure. this test also fails on windows but looks to be for 
> another reason (CASSANDRA-11439)
> http://cassci.datastax.com/job/cassandra-3.0_dtest/682/testReport/replication_test/SnitchConfigurationUpdateTest/test_rf_expand_gossiping_property_file_snitch_multi_dc/
> {noformat}
> Nodetool command '/home/automaton/cassandra/bin/nodetool -h localhost -p 7400 
> getendpoints testing rf_test dummy' failed; exit status: 1; stderr: nodetool: 
> Failed to connect to 'localhost:7400' - ConnectException: 'Connection 
> refused'.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM

2016-05-19 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-11846:
-
Description: 
create table test(  key text primary key, value list );

When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra 
didn't consider it to be an invalid query and then lead to OOM and crashed.

the same plain query(String) can be invalidated by Cassandra and C* responds 
InvalidQuery.

  was:
create table test{
  key text primary key,
  value list
};

When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra 
didn't consider it to be an invalid query and then lead to OOM and crashed.

the same plain query(String) can be invalidated by Cassandra and C* responds 
InvalidQuery.


> Invalid QueryBuilder.insert is not invalidated which causes OOM
> ---
>
> Key: CASSANDRA-11846
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11846
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: cassandra-2.1.14
>Reporter: ZhaoYang
>Priority: Minor
> Fix For: 2.1.15
>
>
> create table test(  key text primary key, value list );
> When using QueryBuilder.Insert() to bind column `value` with a blob, 
> Cassandra didn't consider it to be an invalid query and then lead to OOM and 
> crashed.
> the same plain query(String) can be invalidated by Cassandra and C* responds 
> InvalidQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11846) Invalid QueryBuilder.insert is not invalidated which causes OOM

2016-05-19 Thread ZhaoYang (JIRA)
ZhaoYang created CASSANDRA-11846:


 Summary: Invalid QueryBuilder.insert is not invalidated which 
causes OOM
 Key: CASSANDRA-11846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11846
 Project: Cassandra
  Issue Type: Bug
  Components: CQL
 Environment: cassandra-2.1.14
Reporter: ZhaoYang
Priority: Minor
 Fix For: 2.1.15


create table test{
  key text primary key,
  value list
};

When using QueryBuilder.Insert() to bind column `value` with a blob, Cassandra 
didn't consider it to be an invalid query and then lead to OOM and crashed.

the same plain query(String) can be invalidated by Cassandra and C* responds 
InvalidQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11521) Implement streaming for bulk read requests

2016-05-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291428#comment-15291428
 ] 

Benedict commented on CASSANDRA-11521:
--

I would also like to voice my support for a separate path.  The two needs are 
really quite distinct, and while optimising the normal read path is definitely 
something we should be exploring in general, complicating it with harder to 
reason about system behaviour on the normal path (wrt memory usage, reclaim, 
abort detection etc) _and_ implementation details (leading to bugs around those 
things, for more critical use cases), and yet still unlikely yielding the same 
performance suggests it isn't the best approach for this goal. 

However I would caveat that the idea of evaluating the entire query to an 
off-heap memory region is not what I would have in mind - there's a sliding 
scale starting from a small buffer (or pair of buffers) kept just ahead of the 
client, refilled from a persistent server-side cursor that just avoids 
repeating work to seek into files.  The ideal would be as close to this as 
possible, with a potential time-bound on the lifespan of the cursor, after 
which it can be reinitialised to permit cleanup of sstables.  A configurable 
time limit on isolation could be provided as an option to define this period.

However these streams can be arbitrarily large, so certainly we don't want to 
evaluate the entire query to permit releasing the sstables.

Note, that the OpOrder should not be used by these queries - actual references 
should be taken so that long lifespans have no impact.

The code that takes these references really needs to be fixed, also, so that 
the races to update the data tracker don't cause temporary "infinite" loops - 
like we see for range queries today.

> Implement streaming for bulk read requests
> --
>
> Key: CASSANDRA-11521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11521
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
>
> Allow clients to stream data from a C* host, bypassing the coordination layer 
> and eliminating the need to query individual pages one by one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11489) DynamicCompositeType failures during 2.1 to 3.0 upgrade.

2016-05-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291396#comment-15291396
 ] 

Tyler Hobbs commented on CASSANDRA-11489:
-

No, unfortunately nothing obvious comes to mind.  The code for serializing 
range tombstones in the "legacy" format is fairly complex, so there's 
definitely a possibility that there's a bug there.  I would put a bunch of 
debug statements in {{LegacyLayout.fromUnfilteredRowIterator()}} to check what 
kinds of deletions are present in 3.0 and what the 3.0 node _thinks_ it's 
serializing.

> DynamicCompositeType failures during 2.1 to 3.0 upgrade.
> 
>
> Key: CASSANDRA-11489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.x
>
>
> When upgrading from 2.1.13 to 3.0.4+some (hash 
> 70eab633f289eb1e4fbe47b3e17ff3203337f233) we are seeing the following 
> exceptions on 2.1 nodes after other nodes have been upgraded. With tables 
> using DynamicCompositeType in use.  The workload runs fine once everything is 
> upgraded.
> {code}
> ERROR [MessagingService-Incoming-/10.200.182.2] 2016-04-03 21:49:10,531  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MessagingService-Incoming-/10.200.182.2,5,main]
> java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input 
> length = 1
>   at 
> org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:181)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:200)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.cql3.ColumnIdentifier.(ColumnIdentifier.java:54) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.SimpleSparseCellNameType.fromByteBuffer(SimpleSparseCellNameType.java:83)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:398)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.RangeTombstoneList$Serializer.deserialize(RangeTombstoneList.java:843)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.DeletionInfo$Serializer.deserialize(DeletionInfo.java:407)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:105)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:89)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:73) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:116)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:88)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281) 
> ~[na:1.8.0_40]
>   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816) 
> ~[na:1.8.0_40]
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:109) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:169)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   ... 16 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291390#comment-15291390
 ] 

Paulo Motta commented on CASSANDRA-11845:
-

Can you post debug.log from c0c8af20-1d9c-11e6-9d63-b717b380ffdd and 
e3055fb0-1d9d-11e6-9d63-b717b380ffdd stream sessions? Do you have secondary 
indexes on these tables?

Also it would be nice if you could provide a thread dump of the process with 
{{jstack  >> dump.log}}.

> Hanging repair in cassandra 2.2.4
> -
>
> Key: CASSANDRA-11845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Centos 6
>Reporter: vin01
>Priority: Minor
>
> So after increasing the streaming_timeout_in_ms value to 3 hours, i was able 
> to avoid the socketTimeout errors i was getting earlier 
> (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue 
> is repair just stays stuck.
> current status :-
> [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
> for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
> [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
> for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
> [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
> for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
> [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
> for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
> [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
> for range (6499366179019889198,6523760493740195344] finished (progress: 55%)
> And its 10:46:25 Now, almost 5 hours since it has been stuck right there.
> Earlier i could see repair session going on in system.log but there are no 
> logs coming in right now, all i get in logs is regular index summary 
> redistribution logs.
> Last logs for repair i saw in logs :-
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
> #a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
> [repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
> INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
> Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
> (6499366179019889198,6523760493740195344] finished
> Its an incremental repair, and in "nodetool netstats" output i can see logs 
> like :-
> Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
> /Node-2
> Receiving 8 files, 1093461 bytes total. Already received 8 files, 
> 1093461 bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
>  399475/399475 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
>  53809/53809 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
>  89955/89955 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
>  168790/168790 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
>  107785/107785 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
>  52889/52889 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
>  148882/148882 bytes(100%) received from idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
>  71876/71876 bytes(100%) received from idx:0/Node-2
> Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 
> bytes total
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
>  161895/161895 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
>  399865/399865 bytes(100%) sent to idx:0/Node-2
> 
> /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
>  149066/149066 

[jira] [Updated] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vin01 updated CASSANDRA-11845:
--
Description: 
So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to 
avoid the socketTimeout errors i was getting earlier 
(https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue is 
repair just stays stuck.

current status :-

[2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
[2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
[2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
[2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
[2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
for range (6499366179019889198,6523760493740195344] finished (progress: 55%)


And its 10:46:25 Now, almost 5 hours since it has been stuck right there.

Earlier i could see repair session going on in system.log but there are no logs 
coming in right now, all i get in logs is regular index summary redistribution 
logs.


Last logs for repair i saw in logs :-

INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
#a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
(6499366179019889198,6523760493740195344] finished

Its an incremental repair, and in "nodetool netstats" output i can see logs 
like :-



Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
/Node-2
Receiving 8 files, 1093461 bytes total. Already received 8 files, 
1093461 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
 399475/399475 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
 53809/53809 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
 89955/89955 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
 168790/168790 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
 107785/107785 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
 52889/52889 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
 148882/148882 bytes(100%) received from idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
 71876/71876 bytes(100%) received from idx:0/Node-2
Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes 
total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 161895/161895 bytes(100%) sent to idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 399865/399865 bytes(100%) sent to idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 149066/149066 bytes(100%) sent to idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 126000/126000 bytes(100%) sent to idx:0/Node-2

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 26495/26495 bytes(100%) sent to idx:0/Node-2
Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/Node-3
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/Node-3

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db
 736365/736365 bytes(100%) received 

[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291371#comment-15291371
 ] 

Tyler Hobbs commented on CASSANDRA-10786:
-

I like the idea of using a separate hash/ID for the statement and the result 
set metadata if we want to fix the "prepare storm" problem at the same time.

Overall, it seems like it would work like this:
* In response to a PREPARE message, the server returns a statement ID and a 
result set metadata ID.
* When performing an EXECUTE, the driver sends both IDs.
* If the prepared statement ID isn't found, the server responds with an 
"unprepared" error, and the driver needs to reprepare as usual.
* If the statement ID is found, but the metadata ID doesn't match, the server 
executes the query and responds with a special results message.  This message 
contains the correct result set metadata and its ID, the prepared statement ID, 
and a flag to indicate that it's doing this.
* When the driver receives this special response, it replaces its internal 
result set metadata with the new one from the response.

In the scenario Robert describes above (some nodes have seen a schema change, 
others haven't), this would avoid repreparation of statements.  The driver 
might end up swapping its internal result set metadata for the statement 
several times, but that's relatively inexpensive.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291370#comment-15291370
 ] 

Tyler Hobbs commented on CASSANDRA-10786:
-

I like the idea of using a separate hash/ID for the statement and the result 
set metadata if we want to fix the "prepare storm" problem at the same time.

Overall, it seems like it would work like this:
* In response to a PREPARE message, the server returns a statement ID and a 
result set metadata ID.
* When performing an EXECUTE, the driver sends both IDs.
* If the prepared statement ID isn't found, the server responds with an 
"unprepared" error, and the driver needs to reprepare as usual.
* If the statement ID is found, but the metadata ID doesn't match, the server 
executes the query and responds with a special results message.  This message 
contains the correct result set metadata and its ID, the prepared statement ID, 
and a flag to indicate that it's doing this.
* When the driver receives this special response, it replaces its internal 
result set metadata with the new one from the response.

In the scenario Robert describes above (some nodes have seen a schema change, 
others haven't), this would avoid repreparation of statements.  The driver 
might end up swapping its internal result set metadata for the statement 
several times, but that's relatively inexpensive.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-10786:

Comment: was deleted

(was: I like the idea of using a separate hash/ID for the statement and the 
result set metadata if we want to fix the "prepare storm" problem at the same 
time.

Overall, it seems like it would work like this:
* In response to a PREPARE message, the server returns a statement ID and a 
result set metadata ID.
* When performing an EXECUTE, the driver sends both IDs.
* If the prepared statement ID isn't found, the server responds with an 
"unprepared" error, and the driver needs to reprepare as usual.
* If the statement ID is found, but the metadata ID doesn't match, the server 
executes the query and responds with a special results message.  This message 
contains the correct result set metadata and its ID, the prepared statement ID, 
and a flag to indicate that it's doing this.
* When the driver receives this special response, it replaces its internal 
result set metadata with the new one from the response.

In the scenario Robert describes above (some nodes have seen a schema change, 
others haven't), this would avoid repreparation of statements.  The driver 
might end up swapping its internal result set metadata for the statement 
several times, but that's relatively inexpensive.)

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vin01 updated CASSANDRA-11845:
--
Description: 
So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to 
avoid the socketTimeout errors i was getting earlier 
(https://issues.apache.org/jira/browse/CASSANDRA-11826), but now the issue is 
repair just stays stuck.

current status :-

[2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
[2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
[2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
[2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
[2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
for range (6499366179019889198,6523760493740195344] finished (progress: 55%)


And its 10:46:25 Now, almost 5 hours since it has been stuck right there.

Earlier i could see repair session going on in system.log but there are no logs 
coming in right now, all i get in logs is regular index summary redistribution 
logs.


Last logs for repair i saw in logs :-

INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
#a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
(6499366179019889198,6523760493740195344] finished

Its an incremental repair, and in "nodetool netstats" output i can see logs 
like :-



Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
/192.168.100.138
Receiving 8 files, 1093461 bytes total. Already received 8 files, 
1093461 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
 399475/399475 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
 53809/53809 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
 89955/89955 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
 168790/168790 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
 107785/107785 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
 52889/52889 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
 148882/148882 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
 71876/71876 bytes(100%) received from idx:0/192.168.100.138
Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes 
total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 161895/161895 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 399865/399865 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 149066/149066 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 126000/126000 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 26495/26495 bytes(100%) sent to idx:0/192.168.100.138
Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/192.168.100.147
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total

/data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db
 1598874/1598874 bytes(100%) received from idx:0/192.168.100.147
   

[jira] [Created] (CASSANDRA-11845) Hanging repair in cassandra 2.2.4

2016-05-19 Thread vin01 (JIRA)
vin01 created CASSANDRA-11845:
-

 Summary: Hanging repair in cassandra 2.2.4
 Key: CASSANDRA-11845
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11845
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
 Environment: Centos 6
Reporter: vin01
Priority: Minor


So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to 
avoid the socketTimeout errors i was getting earlier 
(https://issues.apache.org/jira/browse/CASSANDRA-11826), but now the issue is 
repair just stays stuck.

current status :-

[2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd 
for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
[2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd 
for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
[2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd 
for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
[2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd 
for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
[2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd 
for range (6499366179019889198,6523760493740195344] finished (progress: 55%)


And its 10:46:25 Now, almost 5 hours since it has been stuck right there.

Earlier i could see repair session going on in system.log but there are no logs 
coming in right now, all i get in logs is regular index summary redistribution 
logs.


Last logs for repair i saw in logs :-

INFO  [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - [repair 
#a0e5df00-1d99-11e6-9d63-b717b380ffdd] TABLE_NAME is fully synced
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - 
[repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd] Session completed successfully
INFO  [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - 
Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range 
(6499366179019889198,6523760493740195344] finished

Its an incremental repair, and in "nodetool netstats" output i can see logs 
like :-



Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
/192.168.100.138
Receiving 8 files, 1093461 bytes total. Already received 8 files, 
1093461 bytes total

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db
 399475/399475 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db
 53809/53809 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db
 89955/89955 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db
 168790/168790 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db
 107785/107785 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db
 52889/52889 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db
 148882/148882 bytes(100%) received from idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db
 71876/71876 bytes(100%) received from idx:0/192.168.100.138
Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes 
total

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db
 161895/161895 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db
 399865/399865 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db
 149066/149066 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db
 126000/126000 bytes(100%) sent to idx:0/192.168.100.138

/data/cassandra/data/eviveportal/memberinfo-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db
 26495/26495 bytes(100%) sent to idx:0/192.168.100.138
Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
/192.168.100.147
Receiving 11 files, 13896288 bytes total. Already received 11 files, 
13896288 bytes total
   

[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291273#comment-15291273
 ] 

Robert Stupp commented on CASSANDRA-10786:
--

Well, leaving the {{id}} (which is the {{MD5Digest}} for the pstmt) as is 
allows backwards compatibility.
The purpose of a _fingerprint_ is to provide a hash over 
{{ResultSet.ResultMetadata}} - something like a _prepared statement version_.

Imagine that a (reasonable) amount of time can elapse until all cluster nodes 
have processed the schema change. Nodes can be down for whatever reason and get 
the schema change late. Some nodes can be unreachable for other nodes but still 
be available for clients. (Network partitions occur when you don't need them.)
Additionally, a client probably talks to all nodes "simultaneously" and 
therefore gets different results from nodes that have processed the schema 
change and those that did not have processed it. Different results means: some 
nodes will say: "i don't know that pstmt ID - please re-prepare" while others 
respond as expected.
We should not make such situations worse (by causing a _prepare storm_) than it 
already is (schema disagreement).

For example, say you have an application that runs 100,000 queries per second 
for a prepared statement.
At time=0, an {{ALTER TABLE foo ADD bar text}} is run. The schema migration 
takes for example 500ms (just a random number) until all nodes have "switched" 
their schema. This means that 50,000 queries may hit a node that has the new 
schema and re-prepare but hit another node during the next request that does 
not have the new schema.

Also, the information a driver gets via the _control connection_ is not "just 
in time" - unlucky driver instances may get the schema change notification via 
the control connections quite late.

I'm not a fan of changing the way we compute the pstmt {{id}} as we're pleased 
between versions (either C* releases or protocol versions) for the same 
reasons. I agree that we should probably not specify the algorithm to compute 
such IDs into the native protocol specification - but we should keep the 
algorithm to compute these IDs consistent.


> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11799) dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error

2016-05-19 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291271#comment-15291271
 ] 

Michael Shuler commented on CASSANDRA-11799:


I'll check these out in the same environments we test on - they are UTF8 locale 
boxes and I think there are other UTF8 tests, but I'll see what I can come up 
with.

> dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error
> 
>
> Key: CASSANDRA-11799
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11799
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: cqlsh, dtest
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/703/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_unicode_syntax_error
> Failed on CassCI build cassandra-3.0_dtest #703
> Also failing is 
> cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_invalid_request_error
> The relevant failure is
> {code}
> 'ascii' codec can't encode character u'\xe4' in position 12: ordinal not in 
> range(128)
> {code}
> These are failing on 2.2, 3.0 and trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11731) dtest failure in pushed_notifications_test.TestPushedNotifications.move_single_node_test

2016-05-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291267#comment-15291267
 ] 

Philip Thompson commented on CASSANDRA-11731:
-

Testing that change here: 
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/105/

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.move_single_node_test
> 
>
> Key: CASSANDRA-11731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11731
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Assignee: Philip Thompson
>  Labels: dtest
>
> one recent failure (no vnode job)
> {noformat}
> 'MOVED_NODE' != u'NEW_NODE'
> {noformat}
> http://cassci.datastax.com/job/trunk_novnode_dtest/366/testReport/pushed_notifications_test/TestPushedNotifications/move_single_node_test
> Failed on CassCI build trunk_novnode_dtest #366



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup

2016-05-19 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-11743:
---
Reviewer: Branimir Lambov

> Race condition in CommitLog.recover can prevent startup
> ---
>
> Key: CASSANDRA-11743
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11743
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In {{CommitLog::recover}} the list of segments to recover from is determined 
> by removing the files managed by the {{CommitLogSegmentManager}} from the 
> list of files present in the commit log directory. Unfortunatly, due to the 
> way the creation of segments is done there is a time window where a segment 
> file has been created but has not been added yet to the list of segments 
> managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during 
> that time window the Commit log might try to recover from that new segment 
> and crash.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup

2016-05-19 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291265#comment-15291265
 ] 

Benjamin Lerer commented on CASSANDRA-11743:


||Branch||utests||dtests||
|[2.2|https://github.com/blerer/cassandra/tree/11743-2.2]|[2.2|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-2.2-testall/]|[2.2|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-2.2-dtest/]|
|[3.0|https://github.com/blerer/cassandra/tree/11743-3.0]|[3.0|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.0-testall/]|[3.0|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.0-dtest/]|
|[3.7|https://github.com/blerer/cassandra/tree/11743-3.7]|[3.7|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.7-testall/]|[3.7|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11743-3.7-dtest/]|

The new patch does as suggested.

> Race condition in CommitLog.recover can prevent startup
> ---
>
> Key: CASSANDRA-11743
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11743
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In {{CommitLog::recover}} the list of segments to recover from is determined 
> by removing the files managed by the {{CommitLogSegmentManager}} from the 
> list of files present in the commit log directory. Unfortunatly, due to the 
> way the creation of segments is done there is a time window where a segment 
> file has been created but has not been added yet to the list of segments 
> managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during 
> that time window the Commit log might try to recover from that new segment 
> and crash.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11743) Race condition in CommitLog.recover can prevent startup

2016-05-19 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-11743:
---
Fix Version/s: (was: 2.1.x)
   Status: Patch Available  (was: Open)

> Race condition in CommitLog.recover can prevent startup
> ---
>
> Key: CASSANDRA-11743
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11743
> Project: Cassandra
>  Issue Type: Bug
>  Components: Lifecycle
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> In {{CommitLog::recover}} the list of segments to recover from is determined 
> by removing the files managed by the {{CommitLogSegmentManager}} from the 
> list of files present in the commit log directory. Unfortunatly, due to the 
> way the creation of segments is done there is a time window where a segment 
> file has been created but has not been added yet to the list of segments 
> managed by the {{CommitLogSegmentManager}}. If the filtering ocurs during 
> that time window the Commit log might try to recover from that new segment 
> and crash.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11647) Don't use static dataDirectories field in Directories instances

2016-05-19 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11647:
--
   Resolution: Fixed
Fix Version/s: (was: 3.x)
   3.7
   Status: Resolved  (was: Patch Available)

Committed as 
[f294750f535f2a73c71eba589dcaf19074f91bbf|https://github.com/apache/cassandra/commit/f294750f535f2a73c71eba589dcaf19074f91bbf]
 to 3.7 and merged into trunk, thanks.

> Don't use static dataDirectories field in Directories instances
> ---
>
> Key: CASSANDRA-11647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11647
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 3.7
>
>
> Some of the changes to Directories by CASSANDRA-6696 use the static 
> {{dataDirectories}} field, instead of the instance field {{paths}}. This 
> complicates things for external code creating their own Directories instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11837) dtest failure in topology_test.TestTopology.simple_decommission_test

2016-05-19 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson resolved CASSANDRA-11837.
-
Resolution: Fixed

https://github.com/riptano/cassandra-dtest/pull/979

> dtest failure in topology_test.TestTopology.simple_decommission_test
> 
>
> Key: CASSANDRA-11837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11837
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Philip Thompson
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1223/testReport/topology_test/TestTopology/simple_decommission_test
> Failed on CassCI build trunk_dtest #1223
> The problem is that node3 detected node2 as down before the stop call was 
> made, so the wait_other_notice check fails. The fix here is almost certainly 
> as simple as just changing that line to {{node2.stop()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[3/3] cassandra git commit: Merge branch 'cassandra-3.7' into trunk

2016-05-19 Thread aleksey
Merge branch 'cassandra-3.7' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/74f41c9c
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/74f41c9c
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/74f41c9c

Branch: refs/heads/trunk
Commit: 74f41c9cc61959748aa2cb6b42186ecfd8587796
Parents: da9bb03 f294750
Author: Aleksey Yeschenko 
Authored: Thu May 19 16:09:23 2016 +0100
Committer: Aleksey Yeschenko 
Committed: Thu May 19 16:09:23 2016 +0100

--
 CHANGES.txt   | 1 +
 src/java/org/apache/cassandra/db/Directories.java | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/74f41c9c/CHANGES.txt
--
diff --cc CHANGES.txt
index 854ae53,4c50980..4a5dbf4
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,13 -1,5 +1,14 @@@
 +3.8
 + * Support older ant versions (CASSANDRA-11807)
 + * Estimate compressed on disk size when deciding if sstable size limit 
reached (CASSANDRA-11623)
 + * cassandra-stress profiles should support case sensitive schemas 
(CASSANDRA-11546)
 + * Remove DatabaseDescriptor dependency from FileUtils (CASSANDRA-11578)
 + * Faster streaming (CASSANDRA-9766)
 + * Add prepared query parameter to trace for "Execute CQL3 prepared query" 
session (CASSANDRA-11425)
 +
 +
  3.7
+  * Don't use static dataDirectories field in Directories instances 
(CASSANDRA-11647)
  Merged from 3.0:
   * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
   * Allow compaction strategies to disable early open (CASSANDRA-11754)



[1/3] cassandra git commit: Don't use static dataDirectories field in Directories instances

2016-05-19 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.7 b1cf0fe6b -> f294750f5
  refs/heads/trunk da9bb0306 -> 74f41c9cc


Don't use static dataDirectories field in Directories instances

patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11647


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f294750f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f294750f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f294750f

Branch: refs/heads/cassandra-3.7
Commit: f294750f535f2a73c71eba589dcaf19074f91bbf
Parents: b1cf0fe
Author: Blake Eggleston 
Authored: Mon Apr 25 13:06:30 2016 -0700
Committer: Aleksey Yeschenko 
Committed: Thu May 19 16:09:00 2016 +0100

--
 CHANGES.txt   | 1 +
 src/java/org/apache/cassandra/db/Directories.java | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index f96c31a..4c50980 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.7
+ * Don't use static dataDirectories field in Directories instances 
(CASSANDRA-11647)
 Merged from 3.0:
  * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
  * Allow compaction strategies to disable early open (CASSANDRA-11754)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index 3898180..7876959 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -296,7 +296,7 @@ public class Directories
 {
 if (directory != null)
 {
-for (DataDirectory dataDirectory : dataDirectories)
+for (DataDirectory dataDirectory : paths)
 {
 if 
(directory.getAbsolutePath().startsWith(dataDirectory.location.getAbsolutePath()))
 return dataDirectory;
@@ -464,7 +464,7 @@ public class Directories
 public DataDirectory[] getWriteableLocations()
 {
 List nonBlacklistedDirs = new ArrayList<>();
-for (DataDirectory dir : dataDirectories)
+for (DataDirectory dir : paths)
 {
 if (!BlacklistedDirectories.isUnwritable(dir.location))
 nonBlacklistedDirs.add(dir);



[2/3] cassandra git commit: Don't use static dataDirectories field in Directories instances

2016-05-19 Thread aleksey
Don't use static dataDirectories field in Directories instances

patch by Blake Eggleston; reviewed by Aleksey Yeschenko for CASSANDRA-11647


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f294750f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f294750f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f294750f

Branch: refs/heads/trunk
Commit: f294750f535f2a73c71eba589dcaf19074f91bbf
Parents: b1cf0fe
Author: Blake Eggleston 
Authored: Mon Apr 25 13:06:30 2016 -0700
Committer: Aleksey Yeschenko 
Committed: Thu May 19 16:09:00 2016 +0100

--
 CHANGES.txt   | 1 +
 src/java/org/apache/cassandra/db/Directories.java | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index f96c31a..4c50980 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.7
+ * Don't use static dataDirectories field in Directories instances 
(CASSANDRA-11647)
 Merged from 3.0:
  * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
  * Allow compaction strategies to disable early open (CASSANDRA-11754)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/f294750f/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index 3898180..7876959 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -296,7 +296,7 @@ public class Directories
 {
 if (directory != null)
 {
-for (DataDirectory dataDirectory : dataDirectories)
+for (DataDirectory dataDirectory : paths)
 {
 if 
(directory.getAbsolutePath().startsWith(dataDirectory.location.getAbsolutePath()))
 return dataDirectory;
@@ -464,7 +464,7 @@ public class Directories
 public DataDirectory[] getWriteableLocations()
 {
 List nonBlacklistedDirs = new ArrayList<>();
-for (DataDirectory dir : dataDirectories)
+for (DataDirectory dir : paths)
 {
 if (!BlacklistedDirectories.isUnwritable(dir.location))
 nonBlacklistedDirs.add(dir);



[jira] [Commented] (CASSANDRA-11799) dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error

2016-05-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291242#comment-15291242
 ] 

Philip Thompson commented on CASSANDRA-11799:
-

We think we use utf8 everywhere, Michael is looking into it.

> dtest failure in cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_syntax_error
> 
>
> Key: CASSANDRA-11799
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11799
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: cqlsh, dtest
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/703/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_unicode_syntax_error
> Failed on CassCI build cassandra-3.0_dtest #703
> Also failing is 
> cqlsh_tests.cqlsh_tests.TestCqlsh.test_unicode_invalid_request_error
> The relevant failure is
> {code}
> 'ascii' codec can't encode character u'\xe4' in position 12: ordinal not in 
> range(128)
> {code}
> These are failing on 2.2, 3.0 and trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11705) clearSnapshots using Directories.dataDirectories instead of CFS.initialDirectories

2016-05-19 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-11705:
--
   Resolution: Fixed
Fix Version/s: (was: 3.0.x)
   (was: 3.x)
   3.0.7
   3.7
   Status: Resolved  (was: Patch Available)

> clearSnapshots using Directories.dataDirectories instead of 
> CFS.initialDirectories
> --
>
> Key: CASSANDRA-11705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11705
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 3.7, 3.0.7
>
>
> An oversight in CASSANDRA-10518 prevents snapshots created in data 
> directories defined outside of cassandra.yaml from being cleared by 
> {{Keyspace.clearSnapshots}}. {{ColumnFamilyStore.initialDirectories}} should 
> be used when finding snapshots to clear, not {{Directories.dataDirectories}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11705) clearSnapshots using Directories.dataDirectories instead of CFS.initialDirectories

2016-05-19 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291234#comment-15291234
 ] 

Aleksey Yeschenko commented on CASSANDRA-11705:
---

The new version LGTM. Committed to 3.0 as 
[6663c5ff898ff502fc3c69b9f36328c1d9f517e8|https://github.com/apache/cassandra/commit/6663c5ff898ff502fc3c69b9f36328c1d9f517e8]
 and merged with 3.7 and trunk. Thanks.

> clearSnapshots using Directories.dataDirectories instead of 
> CFS.initialDirectories
> --
>
> Key: CASSANDRA-11705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11705
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
>
> An oversight in CASSANDRA-10518 prevents snapshots created in data 
> directories defined outside of cassandra.yaml from being cleared by 
> {{Keyspace.clearSnapshots}}. {{ColumnFamilyStore.initialDirectories}} should 
> be used when finding snapshots to clear, not {{Directories.dataDirectories}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11795) cassandra-stress legacy mode fails - time to remove it?

2016-05-19 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291230#comment-15291230
 ] 

T Jake Luciani commented on CASSANDRA-11795:


Yes I totally agree... 

> cassandra-stress legacy mode fails - time to remove it?
> ---
>
> Key: CASSANDRA-11795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11795
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Michael Shuler
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.x
>
>
> {noformat}
> (trunk)mshuler@hana:~/git/cassandra$ cassandra-stress legacy -o INSERT
> Running in legacy support mode. Translating command to: 
> stress write n=100 -col n=fixed(5) size=fixed(34) data=repeat(1) -rate 
> threads=50 -log interval=10 -mode thrift
> Invalid parameter data=repeat(1)
> Usage:  cassandra-stress  [options]
> Help usage: cassandra-stress help 
> ---Commands---
> read : Multiple concurrent reads - the cluster must first be 
> populated by a write test
> write: Multiple concurrent writes against the cluster
> <...>
> {noformat}
> I tried legacy mode as a one-off, since someone provided a 2.0 stress option 
> command line to duplicate. Is it time to remove legacy, perhaps?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[6/6] cassandra git commit: Merge branch 'cassandra-3.7' into trunk

2016-05-19 Thread aleksey
Merge branch 'cassandra-3.7' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/da9bb030
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/da9bb030
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/da9bb030

Branch: refs/heads/trunk
Commit: da9bb03067e8f11c933e1a04dccf13a1f5a131c7
Parents: beb6464 b1cf0fe
Author: Aleksey Yeschenko 
Authored: Thu May 19 15:58:01 2016 +0100
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:58:01 2016 +0100

--
 CHANGES.txt |  1 +
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 16 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/da9bb030/CHANGES.txt
--
diff --cc CHANGES.txt
index adadefd,f96c31a..854ae53
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,14 -1,6 +1,15 @@@
 +3.8
 + * Support older ant versions (CASSANDRA-11807)
 + * Estimate compressed on disk size when deciding if sstable size limit 
reached (CASSANDRA-11623)
 + * cassandra-stress profiles should support case sensitive schemas 
(CASSANDRA-11546)
 + * Remove DatabaseDescriptor dependency from FileUtils (CASSANDRA-11578)
 + * Faster streaming (CASSANDRA-9766)
 + * Add prepared query parameter to trace for "Execute CQL3 prepared query" 
session (CASSANDRA-11425)
 +
 +
  3.7
  Merged from 3.0:
+  * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
   * Allow compaction strategies to disable early open (CASSANDRA-11754)
   * Refactor Materialized View code (CASSANDRA-11475)
   * Update Java Driver (CASSANDRA-11615)



[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.7

2016-05-19 Thread aleksey
Merge branch 'cassandra-3.0' into cassandra-3.7


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1cf0fe6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1cf0fe6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1cf0fe6

Branch: refs/heads/trunk
Commit: b1cf0fe6bbd3c2cf75cd6b9586a9bd1e9e632e8b
Parents: 326a263 6663c5f
Author: Aleksey Yeschenko 
Authored: Thu May 19 15:57:49 2016 +0100
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:57:49 2016 +0100

--
 CHANGES.txt |  2 ++
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1cf0fe6/CHANGES.txt
--
diff --cc CHANGES.txt
index d029c7b,27398db..f96c31a
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,85 -1,14 +1,87 @@@
 -3.0.7
 +3.7
 +Merged from 3.0:
+  * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
   * Allow compaction strategies to disable early open (CASSANDRA-11754)
   * Refactor Materialized View code (CASSANDRA-11475)
   * Update Java Driver (CASSANDRA-11615)
  Merged from 2.2:
   * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
 + * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669)
   * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
 + * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626)
  
+ 
 -3.0.6
 +3.6
 + * Correctly migrate schema for frozen UDTs during 2.x -> 3.x upgrades
 +   (does not affect any released versions) (CASSANDRA-11613)
 + * Allow server startup if JMX is configured directly (CASSANDRA-11725)
 + * Prevent direct memory OOM on buffer pool allocations (CASSANDRA-11710)
 + * Enhanced Compaction Logging (CASSANDRA-10805)
 + * Make prepared statement cache size configurable (CASSANDRA-11555)
 + * Integrated JMX authentication and authorization (CASSANDRA-10091)
 + * Add units to stress ouput (CASSANDRA-11352)
 + * Fix PER PARTITION LIMIT for single and multi partitions queries 
(CASSANDRA-11603)
 + * Add uncompressed chunk cache for RandomAccessReader (CASSANDRA-5863)
 + * Clarify ClusteringPrefix hierarchy (CASSANDRA-11213)
 + * Always perform collision check before joining ring (CASSANDRA-10134)
 + * SSTableWriter output discrepancy (CASSANDRA-11646)
 + * Fix potential timeout in NativeTransportService.testConcurrentDestroys 
(CASSANDRA-10756)
 + * Support large partitions on the 3.0 sstable format (CASSANDRA-11206)
 + * Add support to rebuild from specific range (CASSANDRA-10406)
 + * Optimize the overlapping lookup by calculating all the
 +   bounds in advance (CASSANDRA-11571)
 + * Support json/yaml output in noetool tablestats (CASSANDRA-5977)
 + * (stress) Add datacenter option to -node options (CASSANDRA-11591)
 + * Fix handling of empty slices (CASSANDRA-11513)
 + * Make number of cores used by cqlsh COPY visible to testing code 
(CASSANDRA-11437)
 + * Allow filtering on clustering columns for queries without secondary 
indexes (CASSANDRA-11310)
 + * Refactor Restriction hierarchy (CASSANDRA-11354)
 + * Eliminate allocations in R/W path (CASSANDRA-11421)
 + * Update Netty to 4.0.36 (CASSANDRA-11567)
 + * Fix PER PARTITION LIMIT for queries requiring post-query ordering 
(CASSANDRA-11556)
 + * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
 + * Support UDT in CQLSSTableWriter (CASSANDRA-10624)
 + * Support for non-frozen user-defined types, updating
 +   individual fields of user-defined types (CASSANDRA-7423)
 + * Make LZ4 compression level configurable (CASSANDRA-11051)
 + * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017)
 + * Make custom filtering more extensible with UserExpression (CASSANDRA-11295)
 + * Improve field-checking and error reporting in cassandra.yaml 
(CASSANDRA-10649)
 + * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507)
 + * More user friendly error when providing an invalid token to nodetool 
(CASSANDRA-9348)
 + * Add static column support to SASI index (CASSANDRA-11183)
 + * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization 
(CASSANDRA-11434)
 + * Support LIKE operator in prepared statements (CASSANDRA-11456)
 + * Add a command to see if a Materialized View has finished building 
(CASSANDRA-9967)
 + * Log endpoint and port associated with streaming operation (CASSANDRA-8777)
 + * Print sensible units for all log messages (CASSANDRA-9692)
 + * Upgrade Netty to version 4.0.34 (CASSANDRA-11096)
 

[2/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots

2016-05-19 Thread aleksey
Use CFS.initialDirectories when clearing snapshots

patch by Blake Eggleston; reviewed by Aleksey Yeschenko for
CASSANDRA-11705


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff

Branch: refs/heads/cassandra-3.7
Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8
Parents: 5a5d0a1
Author: Blake Eggleston 
Authored: Tue May 3 09:00:57 2016 -0700
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:54:19 2016 +0100

--
 CHANGES.txt |  2 ++
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b3e7d5e..27398db 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.7
+ * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
  * Allow compaction strategies to disable early open (CASSANDRA-11754)
  * Refactor Materialized View code (CASSANDRA-11475)
  * Update Java Driver (CASSANDRA-11615)
@@ -6,6 +7,7 @@ Merged from 2.2:
  * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
  * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
 
+
 3.0.6
  * Disallow creating view with a static column (CASSANDRA-11602)
  * Reduce the amount of object allocations caused by the getFunctions methods 
(CASSANDRA-11593)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index a6d5c17..f340b0a 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -116,6 +116,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 initialDirectories = replacementArray;
 }
 
+public static Directories.DataDirectory[] getInitialDirectories()
+{
+Directories.DataDirectory[] src = initialDirectories;
+return Arrays.copyOf(src, src.length);
+}
+
 private static final Logger logger = 
LoggerFactory.getLogger(ColumnFamilyStore.class);
 
 private static final ExecutorService flushExecutor = new 
JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index e00c8b9..f7bb390 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -903,11 +903,17 @@ public class Directories
 return visitor.getAllocatedSize();
 }
 
-// Recursively finds all the sub directories in the KS directory.
 public static List getKSChildDirectories(String ksName)
 {
+return getKSChildDirectories(ksName, dataDirectories);
+
+}
+
+// Recursively finds all the sub directories in the KS directory.
+public static List getKSChildDirectories(String ksName, 
DataDirectory[] directories)
+{
 List result = new ArrayList<>();
-for (DataDirectory dataDirectory : dataDirectories)
+for (DataDirectory dataDirectory : directories)
 {
 File ksDir = new File(dataDirectory.location, ksName);
 File[] cfDirs = ksDir.listFiles();

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java
--
diff --git a/src/java/org/apache/cassandra/db/Keyspace.java 
b/src/java/org/apache/cassandra/db/Keyspace.java
index 273946e..5865364 100644
--- a/src/java/org/apache/cassandra/db/Keyspace.java
+++ b/src/java/org/apache/cassandra/db/Keyspace.java
@@ -276,7 +276,7 @@ public class Keyspace
  */
 public static void clearSnapshot(String snapshotName, String keyspace)
 {
-List snapshotDirs = Directories.getKSChildDirectories(keyspace);
+List snapshotDirs = Directories.getKSChildDirectories(keyspace, 

[1/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots

2016-05-19 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 5a5d0a1eb -> 6663c5ff8
  refs/heads/cassandra-3.7 326a263f4 -> b1cf0fe6b
  refs/heads/trunk beb6464c0 -> da9bb0306


Use CFS.initialDirectories when clearing snapshots

patch by Blake Eggleston; reviewed by Aleksey Yeschenko for
CASSANDRA-11705


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff

Branch: refs/heads/cassandra-3.0
Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8
Parents: 5a5d0a1
Author: Blake Eggleston 
Authored: Tue May 3 09:00:57 2016 -0700
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:54:19 2016 +0100

--
 CHANGES.txt |  2 ++
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b3e7d5e..27398db 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.7
+ * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
  * Allow compaction strategies to disable early open (CASSANDRA-11754)
  * Refactor Materialized View code (CASSANDRA-11475)
  * Update Java Driver (CASSANDRA-11615)
@@ -6,6 +7,7 @@ Merged from 2.2:
  * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
  * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
 
+
 3.0.6
  * Disallow creating view with a static column (CASSANDRA-11602)
  * Reduce the amount of object allocations caused by the getFunctions methods 
(CASSANDRA-11593)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index a6d5c17..f340b0a 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -116,6 +116,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 initialDirectories = replacementArray;
 }
 
+public static Directories.DataDirectory[] getInitialDirectories()
+{
+Directories.DataDirectory[] src = initialDirectories;
+return Arrays.copyOf(src, src.length);
+}
+
 private static final Logger logger = 
LoggerFactory.getLogger(ColumnFamilyStore.class);
 
 private static final ExecutorService flushExecutor = new 
JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index e00c8b9..f7bb390 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -903,11 +903,17 @@ public class Directories
 return visitor.getAllocatedSize();
 }
 
-// Recursively finds all the sub directories in the KS directory.
 public static List getKSChildDirectories(String ksName)
 {
+return getKSChildDirectories(ksName, dataDirectories);
+
+}
+
+// Recursively finds all the sub directories in the KS directory.
+public static List getKSChildDirectories(String ksName, 
DataDirectory[] directories)
+{
 List result = new ArrayList<>();
-for (DataDirectory dataDirectory : dataDirectories)
+for (DataDirectory dataDirectory : directories)
 {
 File ksDir = new File(dataDirectory.location, ksName);
 File[] cfDirs = ksDir.listFiles();

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java
--
diff --git a/src/java/org/apache/cassandra/db/Keyspace.java 
b/src/java/org/apache/cassandra/db/Keyspace.java
index 273946e..5865364 100644
--- a/src/java/org/apache/cassandra/db/Keyspace.java
+++ b/src/java/org/apache/cassandra/db/Keyspace.java
@@ -276,7 +276,7 @@ public class Keyspace
  */
 public static void clearSnapshot(String snapshotName, String 

[3/6] cassandra git commit: Use CFS.initialDirectories when clearing snapshots

2016-05-19 Thread aleksey
Use CFS.initialDirectories when clearing snapshots

patch by Blake Eggleston; reviewed by Aleksey Yeschenko for
CASSANDRA-11705


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6663c5ff
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6663c5ff
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6663c5ff

Branch: refs/heads/trunk
Commit: 6663c5ff898ff502fc3c69b9f36328c1d9f517e8
Parents: 5a5d0a1
Author: Blake Eggleston 
Authored: Tue May 3 09:00:57 2016 -0700
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:54:19 2016 +0100

--
 CHANGES.txt |  2 ++
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index b3e7d5e..27398db 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.7
+ * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
  * Allow compaction strategies to disable early open (CASSANDRA-11754)
  * Refactor Materialized View code (CASSANDRA-11475)
  * Update Java Driver (CASSANDRA-11615)
@@ -6,6 +7,7 @@ Merged from 2.2:
  * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
  * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
 
+
 3.0.6
  * Disallow creating view with a static column (CASSANDRA-11602)
  * Reduce the amount of object allocations caused by the getFunctions methods 
(CASSANDRA-11593)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index a6d5c17..f340b0a 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -116,6 +116,12 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 initialDirectories = replacementArray;
 }
 
+public static Directories.DataDirectory[] getInitialDirectories()
+{
+Directories.DataDirectory[] src = initialDirectories;
+return Arrays.copyOf(src, src.length);
+}
+
 private static final Logger logger = 
LoggerFactory.getLogger(ColumnFamilyStore.class);
 
 private static final ExecutorService flushExecutor = new 
JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Directories.java
--
diff --git a/src/java/org/apache/cassandra/db/Directories.java 
b/src/java/org/apache/cassandra/db/Directories.java
index e00c8b9..f7bb390 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java
@@ -903,11 +903,17 @@ public class Directories
 return visitor.getAllocatedSize();
 }
 
-// Recursively finds all the sub directories in the KS directory.
 public static List getKSChildDirectories(String ksName)
 {
+return getKSChildDirectories(ksName, dataDirectories);
+
+}
+
+// Recursively finds all the sub directories in the KS directory.
+public static List getKSChildDirectories(String ksName, 
DataDirectory[] directories)
+{
 List result = new ArrayList<>();
-for (DataDirectory dataDirectory : dataDirectories)
+for (DataDirectory dataDirectory : directories)
 {
 File ksDir = new File(dataDirectory.location, ksName);
 File[] cfDirs = ksDir.listFiles();

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6663c5ff/src/java/org/apache/cassandra/db/Keyspace.java
--
diff --git a/src/java/org/apache/cassandra/db/Keyspace.java 
b/src/java/org/apache/cassandra/db/Keyspace.java
index 273946e..5865364 100644
--- a/src/java/org/apache/cassandra/db/Keyspace.java
+++ b/src/java/org/apache/cassandra/db/Keyspace.java
@@ -276,7 +276,7 @@ public class Keyspace
  */
 public static void clearSnapshot(String snapshotName, String keyspace)
 {
-List snapshotDirs = Directories.getKSChildDirectories(keyspace);
+List snapshotDirs = Directories.getKSChildDirectories(keyspace, 

[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.7

2016-05-19 Thread aleksey
Merge branch 'cassandra-3.0' into cassandra-3.7


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b1cf0fe6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b1cf0fe6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b1cf0fe6

Branch: refs/heads/cassandra-3.7
Commit: b1cf0fe6bbd3c2cf75cd6b9586a9bd1e9e632e8b
Parents: 326a263 6663c5f
Author: Aleksey Yeschenko 
Authored: Thu May 19 15:57:49 2016 +0100
Committer: Aleksey Yeschenko 
Committed: Thu May 19 15:57:49 2016 +0100

--
 CHANGES.txt |  2 ++
 src/java/org/apache/cassandra/db/ColumnFamilyStore.java |  6 ++
 src/java/org/apache/cassandra/db/Directories.java   | 10 --
 src/java/org/apache/cassandra/db/Keyspace.java  |  2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b1cf0fe6/CHANGES.txt
--
diff --cc CHANGES.txt
index d029c7b,27398db..f96c31a
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,85 -1,14 +1,87 @@@
 -3.0.7
 +3.7
 +Merged from 3.0:
+  * Use CFS.initialDirectories when clearing snapshots (CASSANDRA-11705)
   * Allow compaction strategies to disable early open (CASSANDRA-11754)
   * Refactor Materialized View code (CASSANDRA-11475)
   * Update Java Driver (CASSANDRA-11615)
  Merged from 2.2:
   * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
 + * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669)
   * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
 + * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626)
  
+ 
 -3.0.6
 +3.6
 + * Correctly migrate schema for frozen UDTs during 2.x -> 3.x upgrades
 +   (does not affect any released versions) (CASSANDRA-11613)
 + * Allow server startup if JMX is configured directly (CASSANDRA-11725)
 + * Prevent direct memory OOM on buffer pool allocations (CASSANDRA-11710)
 + * Enhanced Compaction Logging (CASSANDRA-10805)
 + * Make prepared statement cache size configurable (CASSANDRA-11555)
 + * Integrated JMX authentication and authorization (CASSANDRA-10091)
 + * Add units to stress ouput (CASSANDRA-11352)
 + * Fix PER PARTITION LIMIT for single and multi partitions queries 
(CASSANDRA-11603)
 + * Add uncompressed chunk cache for RandomAccessReader (CASSANDRA-5863)
 + * Clarify ClusteringPrefix hierarchy (CASSANDRA-11213)
 + * Always perform collision check before joining ring (CASSANDRA-10134)
 + * SSTableWriter output discrepancy (CASSANDRA-11646)
 + * Fix potential timeout in NativeTransportService.testConcurrentDestroys 
(CASSANDRA-10756)
 + * Support large partitions on the 3.0 sstable format (CASSANDRA-11206)
 + * Add support to rebuild from specific range (CASSANDRA-10406)
 + * Optimize the overlapping lookup by calculating all the
 +   bounds in advance (CASSANDRA-11571)
 + * Support json/yaml output in noetool tablestats (CASSANDRA-5977)
 + * (stress) Add datacenter option to -node options (CASSANDRA-11591)
 + * Fix handling of empty slices (CASSANDRA-11513)
 + * Make number of cores used by cqlsh COPY visible to testing code 
(CASSANDRA-11437)
 + * Allow filtering on clustering columns for queries without secondary 
indexes (CASSANDRA-11310)
 + * Refactor Restriction hierarchy (CASSANDRA-11354)
 + * Eliminate allocations in R/W path (CASSANDRA-11421)
 + * Update Netty to 4.0.36 (CASSANDRA-11567)
 + * Fix PER PARTITION LIMIT for queries requiring post-query ordering 
(CASSANDRA-11556)
 + * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
 + * Support UDT in CQLSSTableWriter (CASSANDRA-10624)
 + * Support for non-frozen user-defined types, updating
 +   individual fields of user-defined types (CASSANDRA-7423)
 + * Make LZ4 compression level configurable (CASSANDRA-11051)
 + * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017)
 + * Make custom filtering more extensible with UserExpression (CASSANDRA-11295)
 + * Improve field-checking and error reporting in cassandra.yaml 
(CASSANDRA-10649)
 + * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507)
 + * More user friendly error when providing an invalid token to nodetool 
(CASSANDRA-9348)
 + * Add static column support to SASI index (CASSANDRA-11183)
 + * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization 
(CASSANDRA-11434)
 + * Support LIKE operator in prepared statements (CASSANDRA-11456)
 + * Add a command to see if a Materialized View has finished building 
(CASSANDRA-9967)
 + * Log endpoint and port associated with streaming operation (CASSANDRA-8777)
 + * Print sensible units for all log messages (CASSANDRA-9692)
 + * Upgrade Netty to version 4.0.34 

[jira] [Commented] (CASSANDRA-11760) dtest failure in TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test

2016-05-19 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291224#comment-15291224
 ] 

Philip Thompson commented on CASSANDRA-11760:
-

I'm re-running the tests that found this, to see if it comes up again. They 
take about 3-4 hours.

> dtest failure in 
> TestCQLNodes3RF3_Upgrade_current_2_2_x_To_next_3_x.more_user_types_test
> 
>
> Key: CASSANDRA-11760
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11760
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Tyler Hobbs
>  Labels: dtest
> Fix For: 3.6
>
> Attachments: node1.log, node1_debug.log, node2.log, node2_debug.log, 
> node3.log, node3_debug.log
>
>
> example failure:
> http://cassci.datastax.com/view/Parameterized/job/upgrade_tests-all-custom_branch_runs/12/testReport/upgrade_tests.cql_tests/TestCQLNodes2RF1_Upgrade_current_2_2_x_To_next_3_x/user_types_test/
> I've attached the logs. The test upgrades from 2.2.5 to 3.6. The relevant 
> failure stack trace extracted here:
> {code}
> ERROR [MessagingService-Incoming-/127.0.0.1] 2016-05-11 17:08:31,33
> 4 CassandraDaemon.java:185 - Exception in thread Thread[MessagingSe
> rvice-Incoming-/127.0.0.1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:99)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:117)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.composites.AbstractCellNameType$5.deserialize(AbstractCellNameType.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:106)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:101)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:109)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:272)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:200)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:177)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91)
>  ~[apache-cassandra-2.2.6.jar:2.2.6]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11844) Create compaction-stress

2016-05-19 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-11844:
---
Description: 
A tool like cassandra-stress that works with stress yaml but:

  * writes directly to a specified dir using CQLSSTableWriter. 
  * lets you run just compaction on that directory and generates a report on 
compaction throughput.

  was:
A tool like cassandra-stress that works with stress yaml but:

 1. writes directly to a specified dir using CQLSSTableWriter. 
  2  lets you run just compaction on that directory and generates a report on 
compaction throughput.


> Create compaction-stress
> 
>
> Key: CASSANDRA-11844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11844
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>
> A tool like cassandra-stress that works with stress yaml but:
>   * writes directly to a specified dir using CQLSSTableWriter. 
>   * lets you run just compaction on that directory and generates a report on 
> compaction throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11844) Create compaction-stress

2016-05-19 Thread T Jake Luciani (JIRA)
T Jake Luciani created CASSANDRA-11844:
--

 Summary: Create compaction-stress
 Key: CASSANDRA-11844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11844
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani


A tool like cassandra-stress that works with stress yaml but:

 1. writes directly to a specified dir using CQLSSTableWriter. 
  2  lets you run just compaction on that directory and generates a report on 
compaction throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11844) Create compaction-stress

2016-05-19 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani reassigned CASSANDRA-11844:
--

Assignee: T Jake Luciani

> Create compaction-stress
> 
>
> Key: CASSANDRA-11844
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11844
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Compaction
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>
> A tool like cassandra-stress that works with stress yaml but:
>  1. writes directly to a specified dir using CQLSSTableWriter. 
>   2  lets you run just compaction on that directory and generates a report on 
> compaction throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11349) MerkleTree mismatch when multiple range tombstones exists for the same partition and interval

2016-05-19 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291211#comment-15291211
 ] 

Stefan Podkowinski commented on CASSANDRA-11349:


I've been debuging the latest mentioned error case using the following cql/ccm 
statements and a local 2 node cluster.

{code}
create keyspace ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2};
use ks;
CREATE TABLE IF NOT EXISTS table1 ( c1 text, c2 text, c3 text, c4 float,
 PRIMARY KEY (c1, c2, c3)
) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'enabled': 
'false'};
DELETE FROM table1 USING TIMESTAMP 1463656272791 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'c';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272792 WHERE c1 = 'a' AND c2 = 'b';
ccm node1 flush
DELETE FROM table1 USING TIMESTAMP 1463656272793 WHERE c1 = 'a' AND c2 = 'b' 
AND c3 = 'd';
ccm node1 flush
{code}

Timestamps have been added for easier tracking of the specific tombstone in the 
debugger.

ColmnIndex.Builder.buildForCompaction() will add tombstones in the following 
order to the tracker:

*Node1*

{{1463656272792: c1 = 'a' AND c2 = 'b'}}
First RT, added to unwritten + opened tombstones

{{1463656272791: c1 = 'a' AND c2 = 'b' AND c3 = 'c'}}
Overshadowed by RT added before while being older at the same time. Will not be 
added and simply ignored.

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}}
Overshaded by first and only RT added to opened so far, but newer and will thus 
be added to unwritten+opened

We end up with 2 unwritten tombstones (..92+..93) passed to the serializer for 
message digest.


*Node2*

{{1463656272792: c1 = 'a' AND c2 = 'b'}} (EOC.START)
First RT, added to unwritten + opened tombstones

{{1463656272793: c1 = 'a' AND c2 = 'b' AND c3 = 'd'}} (EOC.END)
comparision of EOC flag (Tracker:251) of previously added RT will cause having 
it removed from the opened list (Tracker:258). Afterwards the current RT will 
be added to unwritten + opened.

{{1463656272792: c1 = 'a' AND c2 = 'b'}} ({color:red}again!{color})
Gets compared with prev. added RT, which supersedes the current one and thus 
stays in the list. Will again be added to unwritten + opened list.

We end up with 3 unwritten RTs, including 1463656272792 twice.

I still haven't been able to exactly pinpoint why the reducer will be called 
twice with the same TS, but since [~blambov] explicitly mentioned that 
possibility, I guess it's intended behavior (but why? :)). 

> MerkleTree mismatch when multiple range tombstones exists for the same 
> partition and interval
> -
>
> Key: CASSANDRA-11349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Fabien Rousseau
>Assignee: Stefan Podkowinski
>  Labels: repair
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 11349-2.1-v2.patch, 11349-2.1-v3.patch, 11349-2.1.patch
>
>
> We observed that repair, for some of our clusters, streamed a lot of data and 
> many partitions were "out of sync".
> Moreover, the read repair mismatch ratio is around 3% on those clusters, 
> which is really high.
> After investigation, it appears that, if two range tombstones exists for a 
> partition for the same range/interval, they're both included in the merkle 
> tree computation.
> But, if for some reason, on another node, the two range tombstones were 
> already compacted into a single range tombstone, this will result in a merkle 
> tree difference.
> Currently, this is clearly bad because MerkleTree differences are dependent 
> on compactions (and if a partition is deleted and created multiple times, the 
> only way to ensure that repair "works correctly"/"don't overstream data" is 
> to major compact before each repair... which is not really feasible).
> Below is a list of steps allowing to easily reproduce this case:
> {noformat}
> ccm create test -v 2.1.13 -n 2 -s
> ccm node1 cqlsh
> CREATE KEYSPACE test_rt WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> USE test_rt;
> CREATE TABLE IF NOT EXISTS table1 (
> c1 text,
> c2 text,
> c3 float,
> c4 float,
> PRIMARY KEY ((c1), c2)
> );
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 2);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> # now flush only one of the two nodes
> ccm node1 flush 
> ccm node1 cqlsh
> USE test_rt;
> INSERT INTO table1 (c1, c2, c3, c4) VALUES ( 'a', 'b', 1, 3);
> DELETE FROM table1 WHERE c1 = 'a' AND c2 = 'b';
> ctrl ^d
> ccm node1 repair
> # now grep the log and observe that there was some inconstencies detected 
> between nodes (while it shouldn't have detected any)
> ccm node1 showlog | grep "out of sync"
> {noformat}
> Consequences 

[jira] [Resolved] (CASSANDRA-11678) cassandra crush when enable hints_compression

2016-05-19 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko resolved CASSANDRA-11678.
---
Resolution: Cannot Reproduce

Couldn't reproduce this one, sorry. Feel free to reopen if you can provide a 
hints file that reliably trigger the issue. Thank you.

> cassandra crush when enable hints_compression
> -
>
> Key: CASSANDRA-11678
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11678
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, Local Write-Read Paths
> Environment: Centos 7
>Reporter: Weijian Lin
>Assignee: Blake Eggleston
>Priority: Critical
>
> When I enable hints_compression and set the compression class to
> LZ4Compressor,the
> cassandra (v3.05, V3.5.0) will crush。That is a bug, or any conf is wrong?
> *Exception in V 3.5.0 *
> {code}
> ERROR [HintsDispatcher:2] 2016-04-26 15:02:56,970
> HintsDispatchExecutor.java:225 - Failed to dispatch hints file
> abc4dda2-b551-427e-bb0b-e383d4a392e1-1461654138963-1.hints: file is
> corrupted ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:284)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:254)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91) 
> ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259)
>  [apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  [apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  [apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  [apache-cassandra-3.5.0.jar:3.5.0]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_65]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_65]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> Caused by: java.io.EOFException: null
> at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:146)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.io.util.RebufferingInputStream.readInt(RebufferingInputStream.java:188)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:297)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:280)
>  ~[apache-cassandra-3.5.0.jar:3.5.0]
> ... 15 common frames omitted
> {code}
> *Exception in V 3.0.5 *
> {code}
> ERROR [HintsDispatcher:2] 2016-04-26 15:54:46,294
> HintsDispatchExecutor.java:225 - Failed to dispatch hints file
> 8603be13-6878-4de3-8bc3-a7a7146b0376-1461657251205-1.hints: file is
> corrupted ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
>  

[jira] [Commented] (CASSANDRA-11489) DynamicCompositeType failures during 2.1 to 3.0 upgrade.

2016-05-19 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291188#comment-15291188
 ] 

Aleksey Yeschenko commented on CASSANDRA-11489:
---

[~thobbs] I initially assumed that the problem here was with DCT, but 
apparently a 2.1 node, when decoding a read response from a 3.0 node, is trying 
to deserialise some range tombstones that just cannot be there (this trace is 
from reading a CFS table, and those use only cell level tombstones and whole 
partition deletions, exclusively).

Having written 3.0-2.1 upgrade compat code, anything obvious comes to mind re: 
how this got here?

> DynamicCompositeType failures during 2.1 to 3.0 upgrade.
> 
>
> Key: CASSANDRA-11489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.x
>
>
> When upgrading from 2.1.13 to 3.0.4+some (hash 
> 70eab633f289eb1e4fbe47b3e17ff3203337f233) we are seeing the following 
> exceptions on 2.1 nodes after other nodes have been upgraded. With tables 
> using DynamicCompositeType in use.  The workload runs fine once everything is 
> upgraded.
> {code}
> ERROR [MessagingService-Incoming-/10.200.182.2] 2016-04-03 21:49:10,531  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MessagingService-Incoming-/10.200.182.2,5,main]
> java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input 
> length = 1
>   at 
> org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:181)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:200)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.cql3.ColumnIdentifier.(ColumnIdentifier.java:54) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.SimpleSparseCellNameType.fromByteBuffer(SimpleSparseCellNameType.java:83)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:398)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.RangeTombstoneList$Serializer.deserialize(RangeTombstoneList.java:843)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.DeletionInfo$Serializer.deserialize(DeletionInfo.java:407)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:105)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:89)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at org.apache.cassandra.db.Row$RowSerializer.deserialize(Row.java:73) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:116)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:88)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:195)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:172)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281) 
> ~[na:1.8.0_40]
>   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816) 
> ~[na:1.8.0_40]
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:109) 
> ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   at 
> org.apache.cassandra.db.marshal.DynamicCompositeType.getAndAppendComparator(DynamicCompositeType.java:169)
>  ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]
>   ... 16 common frames 

[jira] [Commented] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation

2016-05-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291175#comment-15291175
 ] 

Robert Stupp commented on CASSANDRA-11738:
--

Just thinking that any measured latency is basically aged out when it's 
computed. And something like a "15 minute load" (as the other extreme) cannot 
reflect recent spikes. Also, a measured latency can be influenced by a badly 
timed GC (e.g. G1 running with a 500ms goal that sometimes has "valid" STW 
phases of up to 300/400ms).
Maybe I don't see the point, but I think all nodes (assuming they have the same 
hardware and the cluster is balanced) should have (nearly) equal response 
times. Compactions and GCs can kick in every time anyway.

Just as an idea: a node can request a _ping-response_ from a node it sends a 
request to (could be requested by setting a flag in the verbs' payload).
For example, node "A" sends a request to node "B". The request contains the 
timestamp at node "A". "B" sends a _ping-response_ including the request 
timestamp back to "A" as soon as it deserializes the request. "A" can now 
decide whether to use the calculated latency ({{currentTime() - 
requestTimestamp}}). It could for example ignore that number, which is legit 
when itself hit a longer GC (say, >100ms or so). "A" could also decide, that 
"B" is "slow" because it didn't get the _ping-response_ within a certain time. 
Too complicated?

> Re-think the use of Severity in the DynamicEndpointSnitch calculation
> -
>
> Key: CASSANDRA-11738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11738
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeremiah Jordan
> Fix For: 3.x
>
>
> CASSANDRA-11737 was opened to allow completely disabling the use of severity 
> in the DynamicEndpointSnitch calculation, but that is a pretty big hammer.  
> There is probably something we can do to better use the score.
> The issue seems to be that severity is given equal weight with latency in the 
> current code, also that severity is only based on disk io.  If you have a 
> node that is CPU bound on something (say catching up on LCS compactions 
> because of bootstrap/repair/replace) the IO wait can be low, but the latency 
> to the node is high.
> Some ideas I had are:
> 1. Allowing a yaml parameter to tune how much impact the severity score has 
> in the calculation.
> 2. Taking CPU load into account as well as IO Wait (this would probably help 
> in the cases I have seen things go sideways)
> 3. Move the -D from CASSANDRA-11737 to being a yaml level setting
> 4. Go back to just relying on Latency and get rid of severity all together.  
> Now that we have rapid read protection, maybe just using latency is enough, 
> as it can help where the predictive nature of IO wait would have been useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-11843) Improve test coverage for conditional deletes

2016-05-19 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov resolved CASSANDRA-11843.
-
Resolution: Invalid

Will be solved in the scope of original issue.

> Improve test coverage for conditional deletes
> -
>
> Key: CASSANDRA-11843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11843
> Project: Cassandra
>  Issue Type: Test
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> Follow-up ticket for #9842 to cover conditional deletes for non-existing 
> columns or columns containing null. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291148#comment-15291148
 ] 

Alex Petrov commented on CASSANDRA-10786:
-

I mostly can not see how splitting the "long hash" to {{id}} and 
{{fingerprint}} improves anything. We still do use the "short" version 
internally, for reasons stated above. "long" hash is the only thing we 
communicate with client. It also bears no semantical meaning, we may change it 
as we're pleased, as long as we respect protocol. Also, client would still have 
to come back with both {{id}} and {{fingerprint}} when executing the prepared 
message. So I'm not sure how {{fingerprint}} is useful without the {{id}}.

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11824) If repair fails no way to run repair again

2016-05-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291124#comment-15291124
 ] 

Marcus Eriksson commented on CASSANDRA-11824:
-

pushed and new builds triggered

> If repair fails no way to run repair again
> --
>
> Key: CASSANDRA-11824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
>  Labels: fallout
> Fix For: 3.0.x
>
>
> I have a test that disables gossip and runs repair at the same time. 
> {quote}
> WARN  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> StorageService.java:384 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> Gossiper.java:1463 - Announcing shutdown
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 
> StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> INFO  [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
> INFO  [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting 
> repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting 
> repair command #2, repairing keyspace stresscql with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting 
> repair command #3, repairing keyspace system_traces with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)
> {quote}
> This ends up failing:
> {quote}
> 16:54:44.844 INFO  serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] 
> Starting repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
> [2016-05-17 16:57:21,945] null
> {quote}
> Subsequent calls to repair with all nodes up still fails:
> {quote}
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 
> CompactionManager.java:1193 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - 
> Failed creating a merkle tree for [repair 
> #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291106#comment-15291106
 ] 

Robert Stupp edited comment on CASSANDRA-10786 at 5/19/16 1:48 PM:
---

Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
-  is [short bytes] representing the prepared query ID.
-  is [short bytes] representing the metadata hash.
-  is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case the 
metadata has changed / does not match the fingerprint.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).


was (Author: snazy):
Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
-  is [short bytes] representing the prepared query ID.
-  is [short bytes] representing the metadata hash.
-  is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case it 
has.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> 

[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291106#comment-15291106
 ] 

Robert Stupp commented on CASSANDRA-10786:
--

Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I 
thought too complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add 
another identifier, which is the hash over the result set metadata. So, the 
current ID would stay as it is and we add a _fingerprint_ to _Prepared_ 
response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
-  is [short bytes] representing the prepared query ID.
-  is [short bytes] representing the metadata hash.
-  is composed of:
{code}
And the body for _4.1.6 Execute_ would be 
{{}}.

To handle the situation when that result-set-metadata-fingerprint does not 
match, there are two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, 
Unprepared) telling the client that the result set metadata no longer matches 
and the statement needs to be prepared again.
# We just send out the result set metadata with the _Rows_ response in case it 
has.

The second option would also work around a race condition that could arise with 
a new error code during schema changes. Means: some nodes may already use the 
new result set metadata while others still use the old one. It would also save 
one roundtrip. It makes the code on the client probably a bit more complex, but 
I think it's worth to pay that price in order to prevent this race condition 
(and _prepare storm_).

> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id

2016-05-19 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291098#comment-15291098
 ] 

Alex Petrov commented on CASSANDRA-10786:
-

[~adutra] has a very good point about SHA changes and driver implementers. I'm 
not sure if every driver would deadlock the same, it might depend on the 
implementation, although the Python driver seems to have [same 
behaviour|https://github.com/ifesdjeen/cassandra-dtest/tree/10786-trunk], just 
checked.

I like the idea with {{OPTIONS}}/{{SUPPORTED}} over the rest so far.



> Include hash of result set metadata in prepared statement id
> 
>
> Key: CASSANDRA-10786
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Minor
>  Labels: client-impacting, protocolv5
> Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a 
> prepared statement when the table is altered, to force clients to update 
> their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. 
> The first client to execute the query after the cache was invalidated will 
> receive an UNPREPARED response, re-prepare, and update its local metadata. 
> But other clients might miss it entirely (the MD5 hasn't changed), and they 
> will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, 
> clientA and clientB both have a cache of the metadata (columns b and c) 
> locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, 
> re-prepares on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been 
> populated again, the query succeeds. But clientB still has not updated its 
> metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set 
> metadata in the md5. This way the md5 would change at step 3, and any client 
> using the old md5 would get an UNPREPARED, regardless of whether another 
> client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >