[Cassandra Wiki] Update of "DebianPackaging" by MichaelShuler

2016-09-22 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "DebianPackaging" page has been changed by MichaelShuler:
https://wiki.apache.org/cassandra/DebianPackaging?action=diff=36=37

Comment:
Update allthethings

  == Official Package To Install On Debian(tm) (not a product of Debian(tm)) ==
  
+ Add the following lines to `/etc/apt/sources.list`:
  {{{
- deb http://www.apache.org/dist/cassandra/debian 21x main
+ deb http://www.apache.org/dist/cassandra/debian 30x main
- deb-src http://www.apache.org/dist/cassandra/debian 21x main
+ deb-src http://www.apache.org/dist/cassandra/debian 30x main
  }}}
  
- You will want to replace `21x` by the series you want to use: `20x` for the 
2.0.x series, `12x` for the 1.2.x series, etc... You will not automatically get 
major version updates unless you change the series, but that is ''a feature''.
+ You will want to replace `30x` by the series you want to use: `22x` for the 
2.2.x series, `21x` for the 2.1.x series, etc... You will not automatically get 
major version updates unless you change the series, but that is ''a feature''.
  
+ === Adding Repository Keys ===
  
- If you run ''apt-get update'' now, you will see an error similar to this:
+ If you run `apt-get update` and see an error similar to this:
  {{{
  GPG error: http://www.apache.org unstable Release: The following signatures 
couldn't be verified because the public key is not available: NO_PUBKEY 
F758CE318D77295D
  }}}
  
- This simply means you need to add the PUBLIC_KEY. You do that like this:
+ This simply means you need to add a PUBLIC_KEY for a Apache Cassandra deb 
releases. The Apache Cassandra committer's public keys are available at 
[[https://www.apache.org/dist/cassandra/KEYS]]
  
+ To add the [[https://www.apache.org/dist/cassandra/KEYS|KEYS]] file to apt in 
one command (may be repeated on existing installs, if you see an error similar 
to above, and the new Release Manager keys will be fetched):
  {{{
+ curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
- gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D
- gpg --export --armor F758CE318D77295D | sudo apt-key add -
  }}}
  
- Starting with the 0.7.5 debian package, you will also need to add public key 
2B5C1B00 using the same commands as above:
+ `sudo apt-key list` should show the following keys added, along with the base 
OS keys:
+ {{{
+ pub   1024D/F2833C93 2004-01-18
+ uid  Eric Evans 
+ uid  Eric Evans 
+ uid  Eric Evans 
+ sub   2048g/98CB5BA4 2004-01-18
  
- {{{
- gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00
- gpg --export --armor 2B5C1B00 | sudo apt-key add -
+ pub   4096R/8D77295D 2009-07-12
+ uid  Eric Evans 
+ uid  Eric Evans 
+ uid  Eric Evans 
+ uid  Eric Evans 
+ uid  Eric Evans 
+ sub   4096R/C47D63C0 2009-07-12
+ 
+ pub   2048R/2B5C1B00 2011-04-13
+ uid  Sylvain Lebresne (pcmanus) 
+ sub   2048R/9CB2AA80 2011-04-13
+ 
+ pub   4096R/0353B12C 2014-09-05
+ uid  T Jake Luciani 
+ sub   4096R/D35F8215 2014-09-05
+ 
+ pub   4096R/FE4B2BDA 2009-07-15
+ uid  Michael Shuler 
+ uid  Michael Shuler 
+ sub   4096R/25A883ED 2009-07-15
  }}}
  
- You will also need to add public key 0353B12C using the same commands as 
above:
  
+ '''Alternative Key Fetching'''
+ 
+ If you wish to manually add an individual committer's key, `apt-key` can 
fetch from a keyserver, as follows, using the long key ID:
+ (Michael Shuler's key was added to 
[[https://www.apache.org/dist/cassandra/KEYS|KEYS]] on 2016-09-23, so releases 
after this date may be signed with FE4B2BDA)
+ 
+ Eric Evans:
  {{{
- gpg --keyserver pgp.mit.edu --recv-keys 0353B12C
- gpg --export --armor 0353B12C | sudo apt-key add -
+ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key 
0xF8358FA2F2833C93
+ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key 
0xF758CE318D77295D
  }}}
  
- (The list of Apache contributors public keys is available at 
[[https://www.apache.org/dist/cassandra/KEYS]]). 
+ Sylvain Lebresne:
+ {{{
+ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key 
0x4BD736A82B5C1B00
+ }}}
  
- Then you may install Cassandra by doing:
+ Jake Luciani:
+ {{{
+ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key 
0x749D6EEC0353B12C
+ }}}
+ 
+ Michael Shuler:
+ {{{
+ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key 
0xA278B781FE4B2BDA
+ }}}
+ 
+ === Install Apache Cassandra ===
  
  {{{
  sudo apt-get update
  sudo apt-get install cassandra
+ sudo apt-get install cassandra-tools  # optional utilities
 

[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2016-09-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515506#comment-15515506
 ] 

Alex Petrov commented on CASSANDRA-7631:


There's been some work done in that regard in [CASSANDRA-11844].

> Allow Stress to write directly to SSTables
> --
>
> Key: CASSANDRA-7631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Russell Spitzer
>Assignee: Russell Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it 
> takes to initially load data. For machines with a large amount of ram this 
> becomes especially onerous because a very large amount of data needs to be 
> placed on the machine before page-cache can be circumvented. 
> To remedy this I suggest we add a top level flag to Cassandra-Stress which 
> would cause the tool to write directly to sstables rather than actually 
> performing CQL inserts. Internally this would use CQLSStable writer to write 
> directly to sstables while skipping any keys which are not owned by the node 
> stress is running on. The same stress command run on each node in the cluster 
> would then write unique sstables only containing data which that node is 
> responsible for. Following this no further network IO would be required to 
> distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-22 Thread Nachiket Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515468#comment-15515468
 ] 

Nachiket Patil commented on CASSANDRA-12681:


Agree. I think it is time to revisit some restrictions around keyspace 
creation. For backward compatibility's sake, I haven't removed the code in 
`AbstractReplicationStrategy` where `ConfigurationException` is ignored in 
method `createReplicationStrategy()`. And `ConfigurationException` is thrown in 
`validateReplicationStrategy()`. This way, Cassandra will start even if there 
are any keyspaces already present with incorrect replication setting but will 
not allow creation of new ones or altering existing ones to wrong setting.

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12571) cqlsh lost the ability to have a request wait indefinitely

2016-09-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-12571:
-
Labels: cqlsh lhf  (was: )

> cqlsh lost the ability to have a request wait indefinitely
> --
>
> Key: CASSANDRA-12571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 3.7
>Reporter: Nate Sanders
>Assignee: Stefania
>Priority: Minor
>  Labels: cqlsh, lhf
>
> In commit c7f0032912798b5e53b64d8391e3e3d7e4121165, when client_timeout 
> became request_timeout, the logic was changed so that you can no longer use a 
> timeout of None, despite the docs saying that you can:
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html#cqlshUsingCqlshrc__request-timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12571) cqlsh lost the ability to have a request wait indefinitely

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515273#comment-15515273
 ] 

Stefania commented on CASSANDRA-12571:
--

Thanks for your input [~pauloricardomg].

Another reason to only update the documentation is that request_timeout should 
be consistent with connect_timeout, in that either both can be set to none or 
neither should. However, if many people are depending on none for the request 
timeout, then it is a simple enough patch, the only annoying thing is that the 
patch should be committed from 2.1 since it was broken in 2.1.13.

What's your opinion [~Nate75Sanders]?

> cqlsh lost the ability to have a request wait indefinitely
> --
>
> Key: CASSANDRA-12571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 3.7
>Reporter: Nate Sanders
>Assignee: Stefania
>Priority: Minor
>
> In commit c7f0032912798b5e53b64d8391e3e3d7e4121165, when client_timeout 
> became request_timeout, the logic was changed so that you can no longer use a 
> timeout of None, despite the docs saying that you can:
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html#cqlshUsingCqlshrc__request-timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12239) Add mshuler's key FE4B2BDA to dist/cassandra/KEYS

2016-09-22 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515255#comment-15515255
 ] 

Michael Shuler commented on CASSANDRA-12239:


Committed!
https://lists.apache.org/thread.html/9cb51e95b64de1730e23ae0ee72afab2bc8327b5846f94ea74107554@%3Ccommits.cassandra.apache.org%3E

I will update the download verification and deb install instructions.

> Add mshuler's key FE4B2BDA to dist/cassandra/KEYS
> -
>
> Key: CASSANDRA-12239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12239
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Michael Shuler
>Assignee: Michael Shuler
>Priority: Blocker
> Fix For: 3.8
>
> Attachments: KEYS+mshuler.diff.txt
>
>
> I've started working on packaging with the 3.8 release and signed the staging 
> artifacts with FE4B2BDA. This key will need to be added for the debian 
> repository signature to function correctly, if it's released as-is, or 
> perhaps [~tjake] will need to re-sign the release. Users will need to also 
> fetch this new key and add to {{apt-key}}.
> {{KEYS}} patch attached.
> Assigned to myself, but I am not sure exactly where {{KEYS}} lives - in svn 
> somewhere or a direct upload? :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


svn commit: r15503 - /release/cassandra/KEYS

2016-09-22 Thread mshuler
Author: mshuler
Date: Fri Sep 23 03:19:04 2016
New Revision: 15503

Log:
Add mshuler's key FE4B2BDA to dist/cassandra/KEYS

Closes: CASSANDRA-12239

Modified:
release/cassandra/KEYS

Modified: release/cassandra/KEYS
==
--- release/cassandra/KEYS (original)
+++ release/cassandra/KEYS Fri Sep 23 03:19:04 2016
@@ -3490,3 +3490,377 @@ PkjrM2CQoOMxeLWjXJSwjWeLGPalw2/to9NFClzn
 AUeLlJzSeRLTKhcOugK7UcsQD2FHnMBJz50bxis9X7pjmnc/tWpjAGJfaWdjDIo=
 =yiQ4
 -END PGP PUBLIC KEY BLOCK-
+pub   4096R/FE4B2BDA 2009-07-15
+uid  Michael Shuler 
+sig 3FE4B2BDA 2010-07-13  Michael Shuler 
+sig  DD49F17B 2009-07-15  Michael Shuler 
+sig  F2833C93 2009-07-26  Eric Evans 
+sig  8D77295D 2009-07-26  Eric Evans 
+sig  DE61E2E5 2009-07-29  Hector Oron (zumbi) 
+sig  575D0A76 2009-07-30  Martín Ferrari 
+sig  C1DB921F 2009-08-03  Gunnar Eyal Wolf Iszaevich 
+sig  00F3CFE4 2009-08-07  gregor herrmann 

+sig  8649AA06 2009-08-07  gregor herrmann 

+sig  C1DB921F 2009-07-29  Gunnar Eyal Wolf Iszaevich 
+sig  8D77295D 2009-08-24  Eric Evans 
+sig  1880283C 2009-12-05  Anibal Monsalve Salazar 
+sig  947897D8 2009-12-05  Anibal Monsalve Salazar 
+sig 3FE4B2BDA 2009-07-15  Michael Shuler 
+sig  C1A00121 2010-08-06  Jonas Smedegaard 
+sig  B6250985 2010-08-06  Andrew Lee (李健秋) 
+sig  4006EB3C 2010-08-08  Christopher R. Knadle 

+sig  6A9FDD74 2010-08-08  Christopher Knadle (prime) 

+sig  27572C47 2010-08-09  Aaron M. Ucko 
+sig  F14A64A2 2010-08-09  Aaron M. Ucko 
+sig  3096372C 2010-08-16  Michael Fladerer 

+sig  EE61F443 2010-08-16  Michael Fladerer 

+sig  E397832F 2010-10-10  Luca Capello 
+sig  AC583520 2010-10-24  Holger Levsen 
+sig  069AAA1C 2010-10-24  Holger Levsen 
+sig  54FC8640 2010-11-04  dann frazier 
+sig  0353B12C 2015-02-26  T Jake Luciani 
+uid  Michael Shuler 
+sig 3FE4B2BDA 2010-07-13  Michael Shuler 
+sig  B6250985 2010-08-06  Andrew Lee (李健秋) 
+sig  C1A00121 2010-08-06  Jonas Smedegaard 
+sig  C1DB921F 2010-08-11  Gunnar Eyal Wolf Iszaevich 
+sig  3096372C 2010-08-16  Michael Fladerer 

+sig  EE61F443 2010-08-16  Michael Fladerer 

+sig  E397832F 2010-10-10  Luca Capello 
+sig  54FC8640 2010-11-04  dann frazier 
+sig  0353B12C 2015-02-26  T Jake Luciani 
+sub   4096R/25A883ED 2009-07-15
+sig  FE4B2BDA 2009-07-15  Michael Shuler 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v1
+
+mQINBEpeUMgBEACovNA8+89rJXW8n787hLnU0Fz47277sGOrOR6rDpUlaKSDCwvF
+JlrkhMXmDMMF6VJpNSTBt+WUEk4cZCwJanj61Przux6c60MY2EwPOG/0i0V1UERF
+2kmiFWorlDjQfM9MIWxhyY5UY4qvwfVGjIGpTLmmSBEESocfHscNt80iyq/xWEev
+VTPht6vtBamOXVa9GeczHgWpooQbYC1kdaDJoWnMCyGs2Xz0BTAMP8u8ymGZVJ0g
+srkQxhL2QZpO+3PpipjM708l5YhfUUUmcV7wz2i62wjojSk5frtYzImmbC3z9QIQ
+WRCz9rs5hNqqczSvaHCCsrv/DtCdeesEOxblfuclEoqeULwxbLtU8bEa0wIVLnv3
+s8OEhvb6jzxE7JBWIsJgjXE9RLwUZ46HS1eGNTLHXbeOADtGd62sHwjp26M/XIIY
+w4G2P62D3SdcEkbWGHx9FrX4ssCoVP4l+4HOFfQQVi631tMJMLOduldJUkxo2xF9
+gmNfZSnmftsIjdNaWCYUWCV8sS5FVsiFpvW030a4tWKZNbJ/ySlHFBhu3tn8yDni
+yCcIVYkESzFxASDDiK6az4bSDC9AupDqq5Mcgf94DCwvPIbS171ksuToPMmRuak+
+dGQwmC4PPkUlwyMg18MFOQuLUe9HEdWJADUG2HXX/RQXdYtJzwQOd6HSIQARAQAB
+tCJNaWNoYWVsIFNodWxlciA8bXNodWxlckBnbWFpbC5jb20+iQI3BBMBCAAhBQJM
+PJw7AhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJEKJ4t4H+SyvaTaMP/3dd
+oYs52p0KP87tMw7nOWsLSZzTfUSQ61L2Bfgn+RV+briLu57TMFMc9sHxXGgJKwH+
+k/JQmX+fR411GGQcszjqSukbK15O7/j5DWwuQ7ELt+fNyfm/vcK1r1Uo5we5pSRh
+P7eaUU+Ufie5jVHKhQS6mo3jl89a3agSqToFji4EKLr5rZWyzyJfhvAcaDRRuoBA
+brDCT5P+liufhUH06jmxznEUKPpGDuIq2d7HwmAlzWNW8HlSr+RAb1eCJML/m3Ey
+GH1bElRVZ0lZDxwaQdO2YUoYzhY1gwohKezdIpXeUfRTaNgqARTUji+UgVCtqxOR
+XR/+rVpVgktTO8ZSwFqhpexegZa8sm6iQvJ00ZnOef047qDq7jz5PODmRtnSxtMs
+uV+BNx2XYRDBYNZJV05gtlsqfSYpuU/A9fYwje5sOK5/Moq8d7s5cd/o4c/+UNcc
+m0jM7Cz62yKxawb7Y/dkWOHeVEbCgJMkb17m42AZv0KAPrtUaXX0/odmLW0xMthc

[jira] [Updated] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alwyn Davis updated CASSANDRA-11138:

Attachment: 11138-trunk.patch

I realised that the clusteringDescendantAverages can still be limited to less 
that max limit for each clustering component.  Instead, when inserting 
multi-row partitions, this change just uses the max value for each clustering 
component.

I couldn't see any way to create unit tests for this change, but would be happy 
to if anyone has some pointers.

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J

[jira] [Updated] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alwyn Davis updated CASSANDRA-11138:

Reproduced In: 2.2.4, 3.x  (was: 2.2.4)
   Status: Patch Available  (was: Open)

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |  

[jira] [Commented] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-22 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515172#comment-15515172
 ] 

Jeff Jirsa commented on CASSANDRA-12681:


This has been mentioned in the past (typos in KS replication settings can cause 
queries to fail, in particular), so it's clearly a pain point for multiple 
people, but we should note that this un-does the backwards-compatibility 
changes of CASSANDRA-4795 from 2013. At the time, it was noted that unknown 
options were a horrible thing to do, but also recognized that people were using 
it for various reasons. 

Maybe it IS time to reconsider CASSANDRA-4795 - it's been 3 years, hopefully 
people have stopped abusing the schema.


> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12694) PAXOS Update Corrupted empty row exception

2016-09-22 Thread Cameron Zemek (JIRA)
Cameron Zemek created CASSANDRA-12694:
-

 Summary: PAXOS Update Corrupted empty row exception
 Key: CASSANDRA-12694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12694
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
 Environment: 3 node cluster using RF=3 running on cassandra 3.7
Reporter: Cameron Zemek


{noformat}
cqlsh> create table test.test (test_id TEXT, last_updated TIMESTAMP, message_id 
TEXT, PRIMARY KEY(test_id));
update test.test set last_updated = 1474494363669 where test_id = 'test1' if 
message_id = null;
{noformat}

Then nodetool flush on the all 3 nodes.

{noformat}
cqlsh> update test.test set last_updated = 1474494363669 where test_id = 
'test1' if message_id = null;
ServerError: 
{noformat}

>From cassandra log
{noformat}
ERROR [SharedPool-Worker-1] 2016-09-23 12:09:13,179 Message.java:611 - 
Unexpected exception during request; channel = [id: 0x7a22599e, 
L:/127.0.0.1:9042 - R:/127.0.0.1:58297]
java.io.IOError: java.io.IOException: Corrupt empty row found in unfiltered 
partition
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:224)
 ~[main/:na]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:212)
 ~[main/:na]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(UnfilteredRowIterators.java:125)
 ~[main/:na]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators.digest(UnfilteredPartitionIterators.java:249)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse.makeDigest(ReadResponse.java:87) 
~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$DataResponse.digest(ReadResponse.java:192) 
~[main/:na]
at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:80) 
~[main/:na]
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:139) 
~[main/:na]
at 
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1714)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1663) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1604) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1523) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.readOne(StorageProxy.java:1497) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.readOne(StorageProxy.java:1491) 
~[main/:na]
at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:249) 
~[main/:na]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:441)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:416)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:208)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:239) 
~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:224) 
~[main/:na]
at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
 ~[main/:na]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507)
 [main/:na]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401)
 [main/:na]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12014) IndexSummary > 2G causes an assertion error

2016-09-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-12014:
-
Status: Patch Available  (was: In Progress)

> IndexSummary > 2G causes an assertion error
> ---
>
> Key: CASSANDRA-12014
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12014
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
>
> {noformat}
> ERROR [CompactionExecutor:1546280] 2016-06-01 13:21:00,444  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[CompactionExecutor:1546280,1,main]
> java.lang.AssertionError: null
> at 
> org.apache.cassandra.io.sstable.IndexSummaryBuilder.maybeAddEntry(IndexSummaryBuilder.java:171)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.append(SSTableWriter.java:634)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.afterAppend(SSTableWriter.java:179)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:205) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:197)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_51]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> {noformat}
> I believe this can be fixed by raising the min_index_interval, but we should 
> have a better method of coping with this than throwing the AE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12014) IndexSummary > 2G causes an assertion error

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515108#comment-15515108
 ] 

Stefania commented on CASSANDRA-12014:
--

The test results are clean now, the failures for testall on trunk happen on the 
unpatched branch as well. There were a couple of failing dtests for 3.0 that I 
could not trace to known failures, but they look unrelated and they passed 
locally and in the following builds on Jenkins. 

Therefore, this is ready for review.
 
This is a relatively large patch for what it is trying to achieve, and 
consideration should be given on whether we want the patch in 3.0, at least in 
its full form. The reason is that in order to estimate the key size, we need to 
modify the call chain to create sstable writers, which not only involves adding 
a new parameter to a very long chain but also changing the vast number of 
callers. We should have a valid estimated key size when opening sstables, 
downsampling them, flushing memtables or compacting existing sstables. For 
other cases, mostly offline tools that don't use the summary, we use a default 
constant value that can be changed with a system property.

There is also an extra commit in trunk for refactoring the large number of 
parameters in the call chain to create sstable writers. I tried to add the 
additional parameter introduced by CASSANDRA-10678 to the new class that I 
introduced to group the parameters that should be known by the callers and that 
are optional, the new class also allows setting the parameters via a builder 
pattern. It should arguably be the same as the writer factory, but at the 
moment the factory is static. I don't think the creation of a new instance of 
this class is a performance concern, since we do this per sstable so I don't 
understand why so far we've relied on a static factory with a large number of 
parameters in a very long call chain, which is good for performance but leaves 
very little flexibility.

I did not use the builder pattern for 3.0, I was trying to avoid refactoring 
3.0 code and, as mentioned already above, perhaps in 3.0 we should stick to 
using a default constant value for the key size.


> IndexSummary > 2G causes an assertion error
> ---
>
> Key: CASSANDRA-12014
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12014
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Brandon Williams
>Assignee: Stefania
>Priority: Minor
> Fix For: 3.0.x, 3.x
>
>
> {noformat}
> ERROR [CompactionExecutor:1546280] 2016-06-01 13:21:00,444  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[CompactionExecutor:1546280,1,main]
> java.lang.AssertionError: null
> at 
> org.apache.cassandra.io.sstable.IndexSummaryBuilder.maybeAddEntry(IndexSummaryBuilder.java:171)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.append(SSTableWriter.java:634)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.afterAppend(SSTableWriter.java:179)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:205) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:126)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:197)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_51]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> {noformat}
> I believe this can be fixed by raising the min_index_interval, but we should 
> have a better method of coping with this than throwing 

[jira] [Comment Edited] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514861#comment-15514861
 ] 

Dikang Gu edited comment on CASSANDRA-12693 at 9/23/16 12:36 AM:
-

Aleksey, do you mind to take a look at this one as well?


was (Author: dikanggu):
Aleksey, do you mind to take a look at this one as well?

The patch based on trunk: 
https://github.com/DikangGu/cassandra/commit/c45d8d0ca727f7589d6a3e68bae9a7ffa1e93493

> Add the JMX metrics about the total number of hints we have delivered
> -
>
> Key: CASSANDRA-12693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> I find there are no metrics about the number of hints we have delivered, I 
> think it would be great to have the metrics, so that we have better 
> estimation about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514938#comment-15514938
 ] 

Dikang Gu commented on CASSANDRA-12693:
---

[~iamaleksey] Thanks for the review! Addressed your comments, and here is a new 
commit. 
https://github.com/DikangGu/cassandra/commit/7a4232d01d8b16302a7f6981a9ca3e039ba0ea89

> Add the JMX metrics about the total number of hints we have delivered
> -
>
> Key: CASSANDRA-12693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> I find there are no metrics about the number of hints we have delivered, I 
> think it would be great to have the metrics, so that we have better 
> estimation about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12462) NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)

2016-09-22 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514921#comment-15514921
 ] 

Jeff Jirsa commented on CASSANDRA-12462:


When you're seeing this exception, do you have a schema mismatch? Have you 
recently added or removed a keyspace or table?


> NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)
> 
>
> Key: CASSANDRA-12462
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12462
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jonathan DePrizio
> Attachments: 0001-Fix-NPE-when-running-nodetool-compactionstats.patch
>
>
> Note: The same trace is cited in the last comment of 
> https://issues.apache.org/jira/browse/CASSANDRA-11961
> I've noticed that some of my nodes in my 2.1 cluster have fallen way behind 
> on compactions, and have huge numbers (thousands) of uncompacted, tiny 
> SSTables (~30MB or so).
> In diagnosing the issue, I've found that "nodetool compactionstats" returns 
> the exception below.  Restarting cassandra on the node here causes the 
> pending tasks count to jump to ~2000.  Compactions run properly for about an 
> hour, until this exception occurs again.  Once it occurs, I see the pending 
> tasks value rapidly drop towards zero, but without any compactions actually 
> running (the logs show no compactions finishing).  It would seem that this is 
> causing compactions to fail on this node, which is leading to it running out 
> of space, etc.
> [redacted]# nodetool compactionstats
> xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms12G -Xmx12G 
> -Xmn1000M -Xss255k
> pending tasks: 5
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.getId(CompactionInfo.java:65)
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.asMap(CompactionInfo.java:118)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.getCompactions(CompactionManager.java:1405)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.Trampoline.invoke(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
>   at com.sun.jmx.mbeanserver.PerInterface.getAttribute(Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(Unknown Source)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(Unknown 
> Source)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
>   at sun.rmi.transport.Transport$1.run(Unknown Source)
>   at sun.rmi.transport.Transport$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown 
> Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown 
> Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)



--
This message was sent by Atlassian JIRA

[jira] [Commented] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514886#comment-15514886
 ] 

Aleksey Yeschenko commented on CASSANDRA-12693:
---

Also move metrics inc to the outer edge ({{sendHintsAndAwait()}} method), 
keeping {{await()}} logic as is.

> Add the JMX metrics about the total number of hints we have delivered
> -
>
> Key: CASSANDRA-12693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> I find there are no metrics about the number of hints we have delivered, I 
> think it would be great to have the metrics, so that we have better 
> estimation about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514875#comment-15514875
 ] 

Aleksey Yeschenko commented on CASSANDRA-12693:
---

Sure. I'd like all new hint-related metrics to go to now-empty 
{{HintsServiceMetrics}}, by the way (and eventually move the existing ones 
there as well).

> Add the JMX metrics about the total number of hints we have delivered
> -
>
> Key: CASSANDRA-12693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> I find there are no metrics about the number of hints we have delivered, I 
> think it would be great to have the metrics, so that we have better 
> estimation about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-12693:
--
Reviewer: Aleksey Yeschenko
  Status: Patch Available  (was: Open)

Aleksey, do you mind to take a look at this one as well?

The patch based on trunk: 
https://github.com/DikangGu/cassandra/commit/c45d8d0ca727f7589d6a3e68bae9a7ffa1e93493

> Add the JMX metrics about the total number of hints we have delivered
> -
>
> Key: CASSANDRA-12693
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> I find there are no metrics about the number of hints we have delivered, I 
> think it would be great to have the metrics, so that we have better 
> estimation about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12462) NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)

2016-09-22 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514830#comment-15514830
 ] 

Simon Zhou edited comment on CASSANDRA-12462 at 9/22/16 11:46 PM:
--

I repro'ed the same issue with C* 2.2.5 and C* 3 should have the same issue.  I 
have a one line patch as attached. cc the original author of this line @jbellis 
to take a look.


was (Author: szhou):
I repro'ed the same issue with C* 2.2.5 and C* 3 should have the same issue.  
This is a one line patch:

>From 65d0c7874147910f0d7785a734519efeb9240d78 Mon Sep 17 00:00:00 2001
From: Simon Zhou 
Date: Thu, 22 Sep 2016 16:35:52 -0700
Subject: [PATCH] Fix NPE when running "nodetool compactionstats"

---
 src/java/org/apache/cassandra/db/compaction/CompactionInfo.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
index 535217f..d83423f 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
@@ -65,7 +65,7 @@ public final class CompactionInfo implements Serializable
 
 public UUID getId()
 {
-return cfm.cfId;
+return cfm != null ? cfm.cfId : null;
 }
 
 public String getKeyspace()
-- 
2.9.3


> NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)
> 
>
> Key: CASSANDRA-12462
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12462
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jonathan DePrizio
> Attachments: 0001-Fix-NPE-when-running-nodetool-compactionstats.patch
>
>
> Note: The same trace is cited in the last comment of 
> https://issues.apache.org/jira/browse/CASSANDRA-11961
> I've noticed that some of my nodes in my 2.1 cluster have fallen way behind 
> on compactions, and have huge numbers (thousands) of uncompacted, tiny 
> SSTables (~30MB or so).
> In diagnosing the issue, I've found that "nodetool compactionstats" returns 
> the exception below.  Restarting cassandra on the node here causes the 
> pending tasks count to jump to ~2000.  Compactions run properly for about an 
> hour, until this exception occurs again.  Once it occurs, I see the pending 
> tasks value rapidly drop towards zero, but without any compactions actually 
> running (the logs show no compactions finishing).  It would seem that this is 
> causing compactions to fail on this node, which is leading to it running out 
> of space, etc.
> [redacted]# nodetool compactionstats
> xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms12G -Xmx12G 
> -Xmn1000M -Xss255k
> pending tasks: 5
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.getId(CompactionInfo.java:65)
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.asMap(CompactionInfo.java:118)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.getCompactions(CompactionManager.java:1405)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.Trampoline.invoke(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
>   at com.sun.jmx.mbeanserver.PerInterface.getAttribute(Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(Unknown Source)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown 
> Source)
>   at 

[jira] [Created] (CASSANDRA-12693) Add the JMX metrics about the total number of hints we have delivered

2016-09-22 Thread Dikang Gu (JIRA)
Dikang Gu created CASSANDRA-12693:
-

 Summary: Add the JMX metrics about the total number of hints we 
have delivered
 Key: CASSANDRA-12693
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12693
 Project: Cassandra
  Issue Type: Improvement
Reporter: Dikang Gu
Assignee: Dikang Gu
Priority: Minor
 Fix For: 3.x


I find there are no metrics about the number of hints we have delivered, I 
think it would be great to have the metrics, so that we have better estimation 
about the progress of hints replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12462) NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)

2016-09-22 Thread Simon Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Zhou updated CASSANDRA-12462:
---
Attachment: 0001-Fix-NPE-when-running-nodetool-compactionstats.patch

> NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)
> 
>
> Key: CASSANDRA-12462
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12462
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jonathan DePrizio
> Attachments: 0001-Fix-NPE-when-running-nodetool-compactionstats.patch
>
>
> Note: The same trace is cited in the last comment of 
> https://issues.apache.org/jira/browse/CASSANDRA-11961
> I've noticed that some of my nodes in my 2.1 cluster have fallen way behind 
> on compactions, and have huge numbers (thousands) of uncompacted, tiny 
> SSTables (~30MB or so).
> In diagnosing the issue, I've found that "nodetool compactionstats" returns 
> the exception below.  Restarting cassandra on the node here causes the 
> pending tasks count to jump to ~2000.  Compactions run properly for about an 
> hour, until this exception occurs again.  Once it occurs, I see the pending 
> tasks value rapidly drop towards zero, but without any compactions actually 
> running (the logs show no compactions finishing).  It would seem that this is 
> causing compactions to fail on this node, which is leading to it running out 
> of space, etc.
> [redacted]# nodetool compactionstats
> xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms12G -Xmx12G 
> -Xmn1000M -Xss255k
> pending tasks: 5
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.getId(CompactionInfo.java:65)
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.asMap(CompactionInfo.java:118)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.getCompactions(CompactionManager.java:1405)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.Trampoline.invoke(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
>   at com.sun.jmx.mbeanserver.PerInterface.getAttribute(Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(Unknown Source)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(Unknown 
> Source)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
>   at sun.rmi.transport.Transport$1.run(Unknown Source)
>   at sun.rmi.transport.Transport$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown 
> Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown 
> Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-8795) Cassandra (possibly under load) occasionally throws an exception during CQL create table

2016-09-22 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko reassigned CASSANDRA-8795:


Assignee: Aleksey Yeschenko

> Cassandra (possibly under load) occasionally throws an exception during CQL 
> create table
> 
>
> Key: CASSANDRA-8795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8795
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Darren Warner
>Assignee: Aleksey Yeschenko
> Fix For: 2.1.x
>
>
> CQLSH will return the following:
> {code}
> { name: 'ResponseError',
>   message: 'java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.NullPointerException',
>   info: 'Represents an error message from the server',
>  code: 0,
>  query: 'CREATE TABLE IF NOT EXISTS roles_by_users( userid TIMEUUID, role 
> INT, entityid TIMEUUID, entity_type TEXT, enabled BOOLEAN, PRIMARY KEY 
> (userid, role, entityid, entity_type) );' }
> {code}
> Cassandra system.log shows:
> {code}
> ERROR [MigrationStage:1] 2015-02-11 14:38:48,610 CassandraDaemon.java:153 - 
> Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.db.DefsTables.addColumnFamily(DefsTables.java:371) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:293) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.service.MigrationManager$2.runMayThrow(MigrationManager.java:393)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_31]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_31]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_31]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_31]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]
> ERROR [SharedPool-Worker-2] 2015-02-11 14:38:48,620 QueryMessage.java:132 - 
> Unexpected error during query
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.NullPointerException
> at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:374)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:249)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.cql3.statements.CreateTableStatement.announceMigration(CreateTableStatement.java:113)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:80)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.2.jar:2.1.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.2.jar:2.1.2]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> 

[jira] [Commented] (CASSANDRA-12462) NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)

2016-09-22 Thread Simon Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514830#comment-15514830
 ] 

Simon Zhou commented on CASSANDRA-12462:


I repro'ed the same issue with C* 2.2.5 and C* 3 should have the same issue.  
This is a one line patch:

>From 65d0c7874147910f0d7785a734519efeb9240d78 Mon Sep 17 00:00:00 2001
From: Simon Zhou 
Date: Thu, 22 Sep 2016 16:35:52 -0700
Subject: [PATCH] Fix NPE when running "nodetool compactionstats"

---
 src/java/org/apache/cassandra/db/compaction/CompactionInfo.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java 
b/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
index 535217f..d83423f 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionInfo.java
@@ -65,7 +65,7 @@ public final class CompactionInfo implements Serializable
 
 public UUID getId()
 {
-return cfm.cfId;
+return cfm != null ? cfm.cfId : null;
 }
 
 public String getKeyspace()
-- 
2.9.3


> NullPointerException in CompactionInfo.getId(CompactionInfo.java:65)
> 
>
> Key: CASSANDRA-12462
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12462
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jonathan DePrizio
>
> Note: The same trace is cited in the last comment of 
> https://issues.apache.org/jira/browse/CASSANDRA-11961
> I've noticed that some of my nodes in my 2.1 cluster have fallen way behind 
> on compactions, and have huge numbers (thousands) of uncompacted, tiny 
> SSTables (~30MB or so).
> In diagnosing the issue, I've found that "nodetool compactionstats" returns 
> the exception below.  Restarting cassandra on the node here causes the 
> pending tasks count to jump to ~2000.  Compactions run properly for about an 
> hour, until this exception occurs again.  Once it occurs, I see the pending 
> tasks value rapidly drop towards zero, but without any compactions actually 
> running (the logs show no compactions finishing).  It would seem that this is 
> causing compactions to fail on this node, which is leading to it running out 
> of space, etc.
> [redacted]# nodetool compactionstats
> xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms12G -Xmx12G 
> -Xmn1000M -Xss255k
> pending tasks: 5
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.getId(CompactionInfo.java:65)
>   at 
> org.apache.cassandra.db.compaction.CompactionInfo.asMap(CompactionInfo.java:118)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager.getCompactions(CompactionManager.java:1405)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.Trampoline.invoke(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
>   at com.sun.jmx.mbeanserver.PerInterface.getAttribute(Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(Unknown Source)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(Unknown 
> Source)
>   at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown 
> Source)
>   at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown 
> Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(Unknown 
> Source)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 

[jira] [Resolved] (CASSANDRA-12659) Query in reversed order brough back deleted data

2016-09-22 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko resolved CASSANDRA-12659.
---
Resolution: Duplicate

> Query in reversed order brough back deleted data
> 
>
> Key: CASSANDRA-12659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12659
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.0.5, 6 nodes cluster
>Reporter: Tai Khuu Tan
>
> We have and issues with our Cassandra 3.0.5. After we deleted a large amount 
> of data in the multiple partition keys. Query those partition keys with 
> reversed order on a clustering key return the deleted data. I have checked 
> and there are no tombstones left. All of them are deleted. So I don't know 
> where or how can those deleted data still exist. Is there any other place 
> that Cassandra will read data when query in reverse order compare to normal 
> order ?
> the schema is very simple
> {noformat}
> CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 
> varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, 
> version, data1 , data2 , data3 , data4 ) ) with compact storage;
> {noformat}
> Query are doing reverse order on column timestamp
> Ex:
> {noformat}
> select * from data where uid="uid1" order by version DESC
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data

2016-09-22 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514820#comment-15514820
 ] 

Aleksey Yeschenko commented on CASSANDRA-12659:
---

I'm going ahead and closing the ticket as Duplicate of CASSANDRA-11733, as it 
most likely is. Please feel free to reopen this JIRA ticket if you can 
reproduce in 3.0.9. Thanks.

> Query in reversed order brough back deleted data
> 
>
> Key: CASSANDRA-12659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12659
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.0.5, 6 nodes cluster
>Reporter: Tai Khuu Tan
>
> We have and issues with our Cassandra 3.0.5. After we deleted a large amount 
> of data in the multiple partition keys. Query those partition keys with 
> reversed order on a clustering key return the deleted data. I have checked 
> and there are no tombstones left. All of them are deleted. So I don't know 
> where or how can those deleted data still exist. Is there any other place 
> that Cassandra will read data when query in reverse order compare to normal 
> order ?
> the schema is very simple
> {noformat}
> CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 
> varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, 
> version, data1 , data2 , data3 , data4 ) ) with compact storage;
> {noformat}
> Query are doing reverse order on column timestamp
> Ex:
> {noformat}
> select * from data where uid="uid1" order by version DESC
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514810#comment-15514810
 ] 

Alwyn Davis edited comment on CASSANDRA-11138 at 9/22/16 11:36 PM:
---

I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?


was (Author: alwyn):
I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29

[jira] [Comment Edited] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514810#comment-15514810
 ] 

Alwyn Davis edited comment on CASSANDRA-11138 at 9/22/16 11:36 PM:
---

I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?


was (Author: alwyn):
I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}.

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514810#comment-15514810
 ] 

Alwyn Davis commented on CASSANDRA-11138:
-

I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}.

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J

[jira] [Commented] (CASSANDRA-12688) Change -ea comment in jvm.options

2016-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514745#comment-15514745
 ] 

Edward Capriolo commented on CASSANDRA-12688:
-

Thanks Jeff!

> Change -ea comment in jvm.options
> -
>
> Key: CASSANDRA-12688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12688
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 3.10
>
>
> Config file does nothing to indicate dangers of turning -ea off. Based on 
> recent ML comments better not to dangle a carrot of 5% performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12590) Segfault reading secondary index

2016-09-22 Thread Cameron Zemek (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514736#comment-15514736
 ] 

Cameron Zemek commented on CASSANDRA-12590:
---

Hey [~beobal],

Deployed trunk to one of the test environments and it segfaulted:

{noformat}
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: # A fatal error 
has been detected by the Java Runtime Environment:
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: #
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: #  SIGSEGV (0xb) 
at pc=0x7fba85c4a3b5, pid=1, tid=140438743848704
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: #
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: # JRE version: 
Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: # Java VM: Java 
HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: # Problematic 
frame:
Sep 22 11:14:16 ip-10-222-104-29.ec2.internal cassandra[5296]: # J 14723 C2 
org.apache.cassandra.dht.LocalPartitioner$LocalToken.compareTo(Lorg/apache/cassandra/dht/Token;)I
 (53 bytes) @ 0x7fba85c4a3b5 [0x7fba85c4a280+0x135]
{noformat}

> Segfault reading secondary index
> 
>
> Key: CASSANDRA-12590
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12590
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Occurs on Cassandra 3.5 and 3.7
>Reporter: Cameron Zemek
>Assignee: Sam Tunnicliffe
>
> Getting segfaults when reading secondary index as follows:
> {code}
> J 9272 C2 
> org.apache.cassandra.dht.LocalPartitioner$LocalToken.compareTo(Lorg/apache/cassandra/dht/Token;)I
>  (53 bytes) @ 0x7fd7354749b7 [0x7fd735474840+0x177]
> J 5661 C2 org.apache.cassandra.db.DecoratedKey.compareTo(Ljava/lang/Object;)I 
> (9 bytes) @ 0x7fd7351b35b8 [0x7fd7351b3440+0x178]
> J 14205 C2 
> java.util.concurrent.ConcurrentSkipListMap.doGet(Ljava/lang/Object;)Ljava/lang/Object;
>  (142 bytes) @ 0x7fd736404dd8 [0x7fd736404cc0+0x118]
> J 17764 C2 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
>  (635 bytes) @ 0x7fd736e09638 [0x7fd736e08720+0xf18]
> J 17808 C2 
> org.apache.cassandra.index.internal.CassandraIndexSearcher.search(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;
>  (68 bytes) @ 0x7fd736e01a48 [0x7fd736e012a0+0x7a8]
> J 14217 C2 
> org.apache.cassandra.db.ReadCommand.executeLocally(Lorg/apache/cassandra/db/ReadExecutionController;)Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;
>  (219 bytes) @ 0x7fd736417c1c [0x7fd736416fa0+0xc7c]
> J 14585 C2 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow()V 
> (337 bytes) @ 0x7fd736541e6c [0x7fd736541d60+0x10c]
> J 14584 C2 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run()V 
> (48 bytes) @ 0x7fd7357957b4 [0x7fd735795760+0x54]
> J 9648% C2 org.apache.cassandra.concurrent.SEPWorker.run()V (253 bytes) @ 
> 0x7fd735938d8c [0x7fd7359356e0+0x36ac]
> {code}
> Which I have translated to the codepath:
> org.apache.cassandra.dht.LocalPartitioner (Line 139)
> org.apache.cassandra.db.DecoratedKey (Line 85)
> java.util.concurrent.ConcurrentSkipListMap (Line 794)
> org.apache.cassandra.db.SinglePartitionReadCommand (Line 498)
> org.apache.cassandra.index.internal.CassandraIndexSearcher (Line 60)
> org.apache.cassandra.db.ReadCommand (Line 367)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-22 Thread Nachiket Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nachiket Patil updated CASSANDRA-12681:
---
Status: Patch Available  (was: Open)

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-22 Thread Nachiket Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nachiket Patil updated CASSANDRA-12681:
---
Attachment: trunkpatch.diff

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: trunkpatch.diff, v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12681) Reject empty options and invalid DC names in replication configuration while creating or altering a keyspace.

2016-09-22 Thread Nachiket Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nachiket Patil updated CASSANDRA-12681:
---
Attachment: v3.0patch.diff

> Reject empty options and invalid DC names in replication configuration while 
> creating or altering a keyspace.
> -
>
> Key: CASSANDRA-12681
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12681
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Nachiket Patil
>Assignee: Nachiket Patil
>Priority: Minor
> Attachments: v3.0patch.diff
>
>
> Add some restrictions around create / alter keyspace with 
> NetworkTopologyStrategy:
> 1. Do not accept empty replication configuration (no DC options after class). 
> Cassandra checks that SimpleStrategy must have replication_factor option but 
> does not check that at least one DC should be present in the options for 
> NetworkTopologyStrategy.
> 2. Cassandra accepts any random string as DC name replication option for 
> NetworkTopologyStrategy while creating or altering keyspaces. Add a 
> restriction that the options specified is valid datacenter name. Using 
> incorrect value or simple mistake in typing the DC name can cause outage in 
> production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12659) Query in reversed order brough back deleted data

2016-09-22 Thread Wei Deng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514545#comment-15514545
 ] 

Wei Deng commented on CASSANDRA-12659:
--

[~khuutan...@gmail.com] You can move SSTable from one cluster to another, 
assuming the 2nd cluster can replicate the token ownership of the previous 
cluster, and you also replicate the schema on the 2nd cluster. You probably 
don't want to use sstableloader in this case as it might change the sstables 
when they're landed on the destination.

> Query in reversed order brough back deleted data
> 
>
> Key: CASSANDRA-12659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12659
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.0.5, 6 nodes cluster
>Reporter: Tai Khuu Tan
>
> We have and issues with our Cassandra 3.0.5. After we deleted a large amount 
> of data in the multiple partition keys. Query those partition keys with 
> reversed order on a clustering key return the deleted data. I have checked 
> and there are no tombstones left. All of them are deleted. So I don't know 
> where or how can those deleted data still exist. Is there any other place 
> that Cassandra will read data when query in reverse order compare to normal 
> order ?
> the schema is very simple
> {noformat}
> CREATE TABLE table ( uid varchar, version timestamp, data1 varchar, data2 
> varchar, data3 varchar, data4 varchar, data5 varchar, PRIMARY KEY (uid, 
> version, data1 , data2 , data3 , data4 ) ) with compact storage;
> {noformat}
> Query are doing reverse order on column timestamp
> Ex:
> {noformat}
> select * from data where uid="uid1" order by version DESC
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11811) dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog

2016-09-22 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514525#comment-15514525
 ] 

Jim Witschey commented on CASSANDRA-11811:
--

PR in flight and under discussion here: 
https://github.com/riptano/cassandra-dtest/pull/1340

> dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog
> --
>
> Key: CASSANDRA-11811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11811
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/416/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog
> Failed on CassCI build trunk_dtest_win32 #416
> Relevant error is pasted. This is clearly a test problem. No idea why it only 
> happens on windows, as of yet. Affecting most tests in the 
> TestArchiveCommitlog suite
> {code}
> WARN: Failed to flush node: node1 on shutdown.
> Unexpected error in node1 log, error: 
> ERROR [main] 2016-05-13 21:15:02,701 CassandraDaemon.java:729 - Fatal 
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
> number of tokens from 64 to 32
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:368) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) 
> [main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-09-22 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514498#comment-15514498
 ] 

Jason Brown commented on CASSANDRA-8457:


I've pushed a couple commits to the same branch, and kicked off tests now.

re: {{LegacyClientHandler}} - it ended up not being too difficult to parse the 
message "header" to get through to the payload size. It's implemented in 
{{MessageReceiveHandler}}. I've also removed {{LegacyClientHandler}} and the 
original message in classes, {{MessageInHandler}} and 
{{AppendingByteBufInputStream}} and their tests.

re: flush: yup, dumped my counter thing, and am using 
{{FlushConsolidationHandler}}. There's still some finer semantics/behaviors to 
groom over wrt flushing, but this is a better solution already.

re: {{MessagingService}} As a short term solution, I created an interface to 
abstract out all the blocking IO vs netty related stuffs in MS. It's not a 
thing of beauty, but hopefully it's cleaner and won't need to live all that 
long. I hope we can live with this during the scope of the review.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-12631) Multiple Network Interfaces in non-EC2

2016-09-22 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta resolved CASSANDRA-12631.
-
   Resolution: Duplicate
Reproduced In: 3.7, 2.2.7, 2.2.5  (was: 2.2.5, 2.2.7, 3.7)

Closing this as duplicate of CASSANDRA-12673, please continue discussion there.

>  Multiple Network Interfaces in non-EC2
> ---
>
> Key: CASSANDRA-12631
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12631
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: RHEL6
> Node1 external: 10.240.33.241
> Node1 internal: 192.168.33.241
> Node2 external: 10.240.33.244
> Node2 internal: 192.168.33.244
>  
> cassandra-rackdc.properties (for both nodes) also tried with 
> prefer_local=false:
> dc=vdra015-xs-15
> rack=rack1
> prefer_local=true
>  
> Cassandra.yaml (changes over default):
> seeds: "10.240.33.241"
> listen_address: 192.168.33.241 or 192.168.33.244
> broadcast_address: 10.240.33.241 or 10.240.33.244
> listen_on_broadcast_address: true
> rpc_address: 192.168.33.241 or 192.168.33.244
> endpoint_snitch: GossipingPropertyFileSnitch
>  
> Routing table:
> # ip r
> 192.168.33.0/24 dev eth1  proto kernel  scope link  src 192.168.33.241
> 10.1.21.0/24 dev eth2  proto kernel  scope link  src 10.1.21.241
> 10.1.22.0/24 dev eth3  proto kernel  scope link  src 10.1.22.241
> 10.1.23.0/24 dev eth4  proto kernel  scope link  src 10.1.23.241
> 10.240.32.0/21 dev eth0  proto kernel  scope link  src 10.240.33.241
> default via 10.240.32.1 dev eth0
>Reporter: Amir Dafny-Man
>
> Summary: Unable to connect to seed node (other than self)
> Experienced behavior:
> 1.   Node1 starts up normally
> # netstat -anlp|grep java
> tcp0  0 127.0.0.1:55452 0.0.0.0:*   
> LISTEN  10036/java
> tcp0  0 127.0.0.1:7199  0.0.0.0:*   
> LISTEN  10036/java
> tcp0  0 10.240.33.241:7000  0.0.0.0:*   
> LISTEN  10036/java
> tcp0  0 192.168.33.241:7000 0.0.0.0:*   
> LISTEN  10036/java
> tcp0  0 :::192.168.33.241:9042  :::*
> LISTEN  10036/java
> 2.   When I try to start node2, it is unable to connect to node1 IP set 
> in seeds
> Exception (java.lang.RuntimeException) encountered during startup: Unable to 
> gossip with any seeds
> java.lang.RuntimeException: Unable to gossip with any seeds
> 3.   Running tcpdumpon node2, I can see that node2 is trying to connect 
> to node1 external IP but with its source internal IP
> # tcpdump -nn -i eth0 port 7000
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 09:29:05.239026 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65015480 ecr 
> 0,nop,wscale 9], length 0
> 09:29:06.238188 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65016480 ecr 
> 0,nop,wscale 9], length 0
> 09:29:08.238159 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65018480 ecr 
> 0,nop,wscale 9], length 0
> 09:29:12.238129 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65022480 ecr 
> 0,nop,wscale 9], length 0
> 09:29:20.238129 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65030480 ecr 
> 0,nop,wscale 9], length 0
> 09:29:36.238161 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
> 77957108, win 14600, options [mss 1460,sackOK,TS val 65046480 ecr 
> 0,nop,wscale 9], length 0
> 4.   Running tcpdump on node1, shows packets are not arriving



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12673) Nodes cannot see each other in multi-DC, non-EC2 environment with two-interface nodes due to outbound node-to-node connection binding to private interface

2016-09-22 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514453#comment-15514453
 ] 

Paulo Motta commented on CASSANDRA-12673:
-

It seems the hidden {{cassandra.yaml}} property {{outboundBindAny}} was added 
on CASSANDRA-3839 to avoid problems on streaming due to the use of 
{{socket.getRemoteSocketAddress()}} to identify the remote peer:

bq. allowing override might be better in this case. Because if we change this, 
cassandra's stream might get confused in some cases... in the constructor of 
IncomingStreamReader, as we do socket.getRemoteSocketAddress() to see where the 
stream comes from. if we dont bind it to the right address there may be some 
edge cases where it might not work. 

Since the new streaming protocol no longer uses 
{{socket.getRemoteSocketAddress()}} to identify the peer, I don't see a reason 
to keep that option. A simple workaround for the time being is to basically set 
{{outboundBindAny: true}} on {{cassandra.yaml}}, but moving forward on trunk I 
think we can remove this option since it will probably cause problems for folks 
using multiple network cards/dual stacks. WDYT [~brandon.williams], [~yukim]?

> Nodes cannot see each other in multi-DC, non-EC2 environment with 
> two-interface nodes due to outbound node-to-node connection binding to 
> private interface
> --
>
> Key: CASSANDRA-12673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12673
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Multi-DC, non-EC2 environment with two-interface nodes
>Reporter: Milan Majercik
>Priority: Minor
>
> We have a two-DC cluster in non-EC2 environment with each node containing two 
> interfaces, one using private addresses for intra-DC communication and the 
> other using public addresses for inter-DC communication. After proper 
> configuration setup needed for this kind of environment we observed nodes 
> cannot see each other.
> The configuration changes made for this purpose are as follows:
> *listen_address*: bound to private interface
> *broadcast_address*: bound to public address
> *listen_on_broadcast_address*: true
> *endpoint_snitch*: GossipingPropertyFileSnitch
> *prefer_local*=true (in cassandra-rackdc.properties)
> Upon restart, cassandra node contacts other nodes with their public addresses 
> which is essential for making contacts to foreign data centers. After 
> exhaustive investigation we found cassandra binds outbound node-to-node 
> connections to private interface (the one specified in listen_address) that 
> poses a problem for our environment as these data centers _do not allow 
> connections from private interface to public network_.
> A portion of cassandra code responsible for local binding of outbound 
> connections can be found in method 
> {{org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket}}:
> {code}
> if (!Config.getOutboundBindAny())
> channel.bind(new 
> InetSocketAddress(FBUtilities.getLocalAddress(), 0));
> {code}
> After we commented out these two lines and deployed cassandra.jar across the 
> cluster, the nodes were able to see each other and everything appears to be 
> working fine, including two-DC setup.
> Do you think it's possible to remove these two lines without negative 
> consequences? Alternatively, if the local binding serves some specific 
> purpose of which I'm ignorant would it be possible to make it configurable?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12485) Always require replace_address to replace existing token

2016-09-22 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514248#comment-15514248
 ] 

Paulo Motta commented on CASSANDRA-12485:
-

Hey [~cmlicata], you may use [ccm|https://github.com/pcmanus/ccm] to try to 
reproduce it. You can do something like this:
- Start 3 node cluster (no-vnode)
- Stop non-seed node
- Try to start a 4th node with initial_token set to the one of the stopped node 
and auto_bootstrap = false
- It should fail, unless replace_address=stopped_node is set.

> Always require replace_address to replace existing token
> 
>
> Key: CASSANDRA-12485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12485
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Paulo Motta
>Priority: Minor
>  Labels: lhf
>
> CASSANDRA-10134 prevented replace an existing node unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified.
> We should extend this behavior to tokens, preventing a node from joining the 
> ring if another node with the same token already existing in the ring, unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified in order to avoid 
> catastrophic scenarios.
> One scenario where this can easily happen is if you replace a node with 
> another node with a different IP, and after some time you restart the 
> original node by mistake. The original node will then take over the tokens of 
> the replaced node (since it has a newer gossip generation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12485) Always require replace_address to replace existing token

2016-09-22 Thread Christopher Licata (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514217#comment-15514217
 ] 

Christopher Licata  commented on CASSANDRA-12485:
-

Hey Paulo, I am going to try to take on this task, but I think I am 
misunderstanding the scenario in which a node will take over the tokens from 
another node during replacement.  Could you please explain it a bit more as 
this will be my first attempt at a contribution to this project?

> Always require replace_address to replace existing token
> 
>
> Key: CASSANDRA-12485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12485
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Distributed Metadata
>Reporter: Paulo Motta
>Priority: Minor
>  Labels: lhf
>
> CASSANDRA-10134 prevented replace an existing node unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified.
> We should extend this behavior to tokens, preventing a node from joining the 
> ring if another node with the same token already existing in the ring, unless 
> {{\-Dcassandra.replace_address}} or 
> {{\-Dcassandra.allow_unsafe_replace=true}} is specified in order to avoid 
> catastrophic scenarios.
> One scenario where this can easily happen is if you replace a node with 
> another node with a different IP, and after some time you restart the 
> original node by mistake. The original node will then take over the tokens of 
> the replaced node (since it has a newer gossip generation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12532) Include repair id in repair start message

2016-09-22 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-12532:

   Resolution: Fixed
Fix Version/s: 3.10
   Status: Resolved  (was: Awaiting Feedback)

The dtest has been merged, so I've committed this as 
{{c92928bb9c2441254b51e2ea4dc742c9245b9f4c}} to trunk.  Thanks!

> Include repair id in repair start message
> -
>
> Key: CASSANDRA-12532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12532
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
> Fix For: 3.10
>
> Attachments: 12532-trunk.patch
>
>
> Currently its not really possible to map the repairs command id that is 
> returned from JMX to the id used tables in in system_traces, and 
> system_distributed keyspaces. In the START we can just include it in the 
> message to make it possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12692) document read path

2016-09-22 Thread Jon Haddad (JIRA)
Jon Haddad created CASSANDRA-12692:
--

 Summary: document read path
 Key: CASSANDRA-12692
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12692
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Jon Haddad
Assignee: Jon Haddad
Priority: Minor


I'm not seeing any docs for the read path.  We should port this over from the 
wiki, assuming it's correct:

https://wiki.apache.org/cassandra/ReadPathForUsers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Include session IDs in repair start message

2016-09-22 Thread tylerhobbs
Repository: cassandra
Updated Branches:
  refs/heads/trunk f92f959eb -> c92928bb9


Include session IDs in repair start message

Patch by Chris Lohfink; reviewed by Tyler Hobbs for CASSANDRA-12532


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c92928bb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c92928bb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c92928bb

Branch: refs/heads/trunk
Commit: c92928bb9c2441254b51e2ea4dc742c9245b9f4c
Parents: f92f959
Author: Chris Lohfink 
Authored: Thu Aug 25 13:41:13 2016 -0500
Committer: Tyler Hobbs 
Committed: Thu Sep 22 14:11:30 2016 -0500

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/repair/RepairRunnable.java | 9 +
 2 files changed, 6 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c92928bb/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index df3c775..8d96050 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.10
+ * Include repair session IDs in repair start message (CASSANDRA-12532)
  * Add a blocking task to Index, run before joining the ring (CASSANDRA-12039)
  * Fix NPE when using CQLSSTableWriter (CASSANDRA-12667)
  * Support optional backpressure strategies at the coordinator (CASSANDRA-9318)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/c92928bb/src/java/org/apache/cassandra/repair/RepairRunnable.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairRunnable.java 
b/src/java/org/apache/cassandra/repair/RepairRunnable.java
index ca06bcb..efa8234 100644
--- a/src/java/org/apache/cassandra/repair/RepairRunnable.java
+++ b/src/java/org/apache/cassandra/repair/RepairRunnable.java
@@ -110,7 +110,7 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 protected void runMayThrow() throws Exception
 {
 final TraceState traceState;
-
+final UUID parentSession = UUIDGen.getTimeUUID();
 final String tag = "repair:" + cmd;
 
 final AtomicInteger progress = new AtomicInteger();
@@ -131,10 +131,9 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 }
 
 final long startTime = System.currentTimeMillis();
-String message = String.format("Starting repair command #%d, repairing 
keyspace %s with %s", cmd, keyspace,
+String message = String.format("Starting repair command #%d (%s), 
repairing keyspace %s with %s", cmd, parentSession, keyspace,
options);
 logger.info(message);
-fireProgressEvent(tag, new ProgressEvent(ProgressEventType.START, 0, 
100, message));
 if (options.isTraced())
 {
 StringBuilder cfsb = new StringBuilder();
@@ -144,6 +143,8 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 UUID sessionId = 
Tracing.instance.newSession(Tracing.TraceType.REPAIR);
 traceState = Tracing.instance.begin("repair", 
ImmutableMap.of("keyspace", keyspace, "columnFamilies",
   
cfsb.substring(2)));
+message = message + " tracing with " + sessionId;
+fireProgressEvent(tag, new ProgressEvent(ProgressEventType.START, 
0, 100, message));
 Tracing.traceRepair(message);
 traceState.enableActivityNotification(tag);
 for (ProgressListener listener : listeners)
@@ -154,6 +155,7 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 }
 else
 {
+fireProgressEvent(tag, new ProgressEvent(ProgressEventType.START, 
0, 100, message));
 traceState = null;
 }
 
@@ -204,7 +206,6 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 cfnames[i] = columnFamilyStores.get(i).name;
 }
 
-final UUID parentSession = UUIDGen.getTimeUUID();
 SystemDistributedKeyspace.startParentRepair(parentSession, keyspace, 
cfnames, options);
 long repairedAt;
 try



[jira] [Resolved] (CASSANDRA-12690) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Highstead resolved CASSANDRA-12690.
---
Resolution: Fixed

> LWT: Inserting Subset of columns returns all columns
> 
>
> Key: CASSANDRA-12690
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12690
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: 3.x
>Reporter: Highstead
>Priority: Minor
>  Labels: transaction, transactions
>
> See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669
> When inserting a subset of the table columns with the use of light weight 
> transactions the cassandra result returns a full set of unordered cassandra 
> column values.  
> SETUP:
> ```
> CREATE TABLE IF NOT EXISTS test.inserttest(
> key bigint,
> session_token text,
> foo text, 
> bar text,
> PRIMARY KEY(key, event_date, session_token);
> INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'baz') IF NOT EXISTS;
> ```
> `insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'bez') IF NOT EXISTS;`
> Expected result: Returns False, 1, myToken, baz
> Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12691) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Highstead updated CASSANDRA-12691:
--
Description: 
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
{code}
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
{code}


{{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;}}

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.

  was:
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
{{
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
}}


{{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;}}

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.


> LWT: Inserting Subset of columns returns all columns
> 
>
> Key: CASSANDRA-12691
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12691
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: 3.x
>Reporter: Highstead
>Priority: Minor
>  Labels: transaction, transactions
>
> See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669
> When inserting a subset of the table columns with the use of light weight 
> transactions the cassandra result returns a full set of unordered cassandra 
> column values.  
> SETUP:
> {code}
> CREATE TABLE IF NOT EXISTS test.inserttest(
> key bigint,
> session_token text,
> foo text, 
> bar text,
> PRIMARY KEY(key, event_date, session_token);
> INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'baz') IF NOT EXISTS;
> {code}
> {{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'bez') IF NOT EXISTS;}}
> Expected result: Returns False, 1, myToken, baz
> Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12691) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Highstead updated CASSANDRA-12691:
--
Description: 
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
{code}
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
{code}


{code}insert into test.inserttest(key, session_token, foo) VALUES (1, 
'myToken', 'bez') IF NOT EXISTS;{code}

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.

  was:
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
{code}
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
{code}


{{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;}}

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.


> LWT: Inserting Subset of columns returns all columns
> 
>
> Key: CASSANDRA-12691
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12691
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: 3.x
>Reporter: Highstead
>Priority: Minor
>  Labels: transaction, transactions
>
> See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669
> When inserting a subset of the table columns with the use of light weight 
> transactions the cassandra result returns a full set of unordered cassandra 
> column values.  
> SETUP:
> {code}
> CREATE TABLE IF NOT EXISTS test.inserttest(
> key bigint,
> session_token text,
> foo text, 
> bar text,
> PRIMARY KEY(key, event_date, session_token);
> INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'baz') IF NOT EXISTS;
> {code}
> {code}insert into test.inserttest(key, session_token, foo) VALUES (1, 
> 'myToken', 'bez') IF NOT EXISTS;{code}
> Expected result: Returns False, 1, myToken, baz
> Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12691) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Highstead updated CASSANDRA-12691:
--
Description: 
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
{{
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
}}


{{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;}}

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.

  was:
See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
```
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
```

`insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;`

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.


> LWT: Inserting Subset of columns returns all columns
> 
>
> Key: CASSANDRA-12691
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12691
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL
> Environment: 3.x
>Reporter: Highstead
>Priority: Minor
>  Labels: transaction, transactions
>
> See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669
> When inserting a subset of the table columns with the use of light weight 
> transactions the cassandra result returns a full set of unordered cassandra 
> column values.  
> SETUP:
> {{
> CREATE TABLE IF NOT EXISTS test.inserttest(
> key bigint,
> session_token text,
> foo text, 
> bar text,
> PRIMARY KEY(key, event_date, session_token);
> INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'baz') IF NOT EXISTS;
> }}
> {{insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
> 'bez') IF NOT EXISTS;}}
> Expected result: Returns False, 1, myToken, baz
> Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12690) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)
Highstead created CASSANDRA-12690:
-

 Summary: LWT: Inserting Subset of columns returns all columns
 Key: CASSANDRA-12690
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12690
 Project: Cassandra
  Issue Type: Improvement
  Components: CQL
 Environment: 3.x
Reporter: Highstead
Priority: Minor


See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
```
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
```

`insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;`

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12691) LWT: Inserting Subset of columns returns all columns

2016-09-22 Thread Highstead (JIRA)
Highstead created CASSANDRA-12691:
-

 Summary: LWT: Inserting Subset of columns returns all columns
 Key: CASSANDRA-12691
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12691
 Project: Cassandra
  Issue Type: Improvement
  Components: CQL
 Environment: 3.x
Reporter: Highstead
Priority: Minor


See: https://github.com/gocql/gocql/issues/792#issuecomment-248983669

When inserting a subset of the table columns with the use of light weight 
transactions the cassandra result returns a full set of unordered cassandra 
column values.  

SETUP:
```
CREATE TABLE IF NOT EXISTS test.inserttest(
key bigint,
session_token text,

foo text, 
bar text,
PRIMARY KEY(key, event_date, session_token);

INSERT INTO test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'baz') IF NOT EXISTS;
```

`insert into test.inserttest(key, session_token, foo) VALUES (1, 'myToken', 
'bez') IF NOT EXISTS;`

Expected result: Returns False, 1, myToken, baz
Actual result: Returns true and all column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12685) Add retry to hints dispatcher

2016-09-22 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513798#comment-15513798
 ] 

Dikang Gu edited comment on CASSANDRA-12685 at 9/22/16 6:30 PM:


[~spo...@gmail.com] thanks for the reply, yes, the logs are from 2.x. Maybe I 
missed something,  I did not find any retry logic even in 3.x branch or trunk, 
do you mind to point me to it?


was (Author: dikanggu):
[~spo...@gmail.com] thanks for the replay, yes, the logs are from 2.x. Maybe I 
missed something,  I did not find any retry logic even in 3.x branch or trunk, 
do you mind to point me to it?

> Add retry to hints dispatcher
> -
>
> Key: CASSANDRA-12685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12685
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> Problem: I often see timeout in hints replay, I find there is no retry for 
> hints replay, I think it would be great to add some retry logic for timeout 
> exception.
> {code}
> 2016-09-20_07:32:01.16610 INFO  07:32:01 [HintedHandoff:3]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_07:58:49.29983 INFO  07:58:49 [HintedHandoff:3]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (55040 delivered)
> 2016-09-20_07:58:49.29984 INFO  07:58:49 [HintedHandoff:3]: Enqueuing flush 
> of hints: 15962349 (0%) on-heap, 2049808 (0%) off-heap
> 2016-09-20_08:02:17.55072 INFO  08:02:17 [HintedHandoff:1]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:05:45.25723 INFO  08:05:45 [HintedHandoff:1]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (7936 delivered)
> 2016-09-20_08:05:45.25725 INFO  08:05:45 [HintedHandoff:1]: Enqueuing flush 
> of hints: 2301605 (0%) on-heap, 259744 (0%) off-heap
> 2016-09-20_08:12:19.92910 INFO  08:12:19 [HintedHandoff:2]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:51:44.72191 INFO  08:51:44 [HintedHandoff:2]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (83456 delivered)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12689) All MutationStage threads blocked, kills server

2016-09-22 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514073#comment-15514073
 ] 

Benjamin Roth edited comment on CASSANDRA-12689 at 9/22/16 6:23 PM:


Same situation, different node, different trace:

Name: MutationStage-37
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@1e1dc9eb
Total blocked: 58  Total waited: 709.137

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.service.StorageProxy$$Lambda$249/1210907398.run(Unknown 
Source)
org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1410)
org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2628)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)

Load Graph of affected node: https://cl.ly/201c3T3f0M0s
Mutations: https://cl.ly/0T2R3b0y2435


was (Author: brstgt):
Same situation, different node, different trace:

Name: MutationStage-37
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@1e1dc9eb
Total blocked: 58  Total waited: 709.137

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.service.StorageProxy$$Lambda$249/1210907398.run(Unknown 
Source)
org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1410)
org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2628)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)

> All MutationStage threads blocked, kills server
> ---
>
> Key: CASSANDRA-12689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benjamin Roth
>Priority: Critical
>
> Under heavy load (e.g. due to repair during normal operations), a lot of 
> NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
> very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
> exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are 
> completely blocked. This leads to piling up pending tasks until server runs 
> OOM and is completely unresponsive due to GC. Threads will NEVER unblock 
> until server restart. Even if load goes completely down, all hints are 
> paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy 
> load, but tasks should be processed and dequeud after load goes down. This is 
> definitively not the case. This looks more like a an unhandled exception 
> leading to a stuck lock.
> Stack trace from jconsole, all Threads in 

[jira] [Commented] (CASSANDRA-12689) All MutationStage threads blocked, kills server

2016-09-22 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514073#comment-15514073
 ] 

Benjamin Roth commented on CASSANDRA-12689:
---

Same situation, different node, different trace:

Name: MutationStage-37
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@1e1dc9eb
Total blocked: 58  Total waited: 709.137

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.service.StorageProxy$$Lambda$249/1210907398.run(Unknown 
Source)
org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1410)
org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2628)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)

> All MutationStage threads blocked, kills server
> ---
>
> Key: CASSANDRA-12689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benjamin Roth
>Priority: Critical
>
> Under heavy load (e.g. due to repair during normal operations), a lot of 
> NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
> very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
> exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are 
> completely blocked. This leads to piling up pending tasks until server runs 
> OOM and is completely unresponsive due to GC. Threads will NEVER unblock 
> until server restart. Even if load goes completely down, all hints are 
> paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy 
> load, but tasks should be processed and dequeud after load goes down. This is 
> definitively not the case. This looks more like a an unhandled exception 
> leading to a stuck lock.
> Stack trace from jconsole, all Threads in MutationStage show same trace.
> Name: MutationStage-48
> State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
> Total blocked: 137  Total waited: 138.513
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
> org.apache.cassandra.hints.Hint.apply(Hint.java:96)
> org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11811) dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog

2016-09-22 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514070#comment-15514070
 ] 

Jim Witschey commented on CASSANDRA-11811:
--

I can fix this quickly and easily in the CDC tests, but no promises for the 
snapshot tests.

> dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog
> --
>
> Key: CASSANDRA-11811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11811
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/416/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog
> Failed on CassCI build trunk_dtest_win32 #416
> Relevant error is pasted. This is clearly a test problem. No idea why it only 
> happens on windows, as of yet. Affecting most tests in the 
> TestArchiveCommitlog suite
> {code}
> WARN: Failed to flush node: node1 on shutdown.
> Unexpected error in node1 log, error: 
> ERROR [main] 2016-05-13 21:15:02,701 CassandraDaemon.java:729 - Fatal 
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
> number of tokens from 64 to 32
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:368) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) 
> [main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11811) dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog

2016-09-22 Thread Jim Witschey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Witschey reassigned CASSANDRA-11811:


Assignee: Jim Witschey  (was: DS Test Eng)

> dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog
> --
>
> Key: CASSANDRA-11811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11811
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: Jim Witschey
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/416/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog
> Failed on CassCI build trunk_dtest_win32 #416
> Relevant error is pasted. This is clearly a test problem. No idea why it only 
> happens on windows, as of yet. Affecting most tests in the 
> TestArchiveCommitlog suite
> {code}
> WARN: Failed to flush node: node1 on shutdown.
> Unexpected error in node1 log, error: 
> ERROR [main] 2016-05-13 21:15:02,701 CassandraDaemon.java:729 - Fatal 
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
> number of tokens from 64 to 32
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:368) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) 
> [main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12571) cqlsh lost the ability to have a request wait indefinitely

2016-09-22 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514033#comment-15514033
 ] 

Paulo Motta commented on CASSANDRA-12571:
-

No particular reason to remove, I guess I just did it because I reused the 
parsing snippet from {{connect_timeout}} without noticing the {{none}} special 
case. I'm fine with either re-instating or changing the doc, but would be 
slightly more in favor of just updating the doc, given the simple workaround of 
adding a larger timeout, unless there is a compelling reason to retain the old 
behavior.

> cqlsh lost the ability to have a request wait indefinitely
> --
>
> Key: CASSANDRA-12571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 3.7
>Reporter: Nate Sanders
>Assignee: Stefania
>Priority: Minor
>
> In commit c7f0032912798b5e53b64d8391e3e3d7e4121165, when client_timeout 
> became request_timeout, the logic was changed so that you can no longer use a 
> timeout of None, despite the docs saying that you can:
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html#cqlshUsingCqlshrc__request-timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11811) dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog

2016-09-22 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514027#comment-15514027
 ] 

Joel Knighton commented on CASSANDRA-11811:
---

Both these failures occur because a commitlog from another node is replayed 
that contains a mutation inserting tokens to the system.local table. This is 
why we see an exactly double token count as configured in {{cassandra.yaml}}; 
the node generates its own tokens during startup, and it inserts another 
collection of tokens during commitlog replay.

In the case of the CDC test, this is because the commitlog segment for this 
insert also contains writes to the CDC table. We can fix this by draining the 
node and stopping/starting it before doing any inserts to the CDC table.

In the case of the commitlog test, we remove archived commitlogs before 
snapshotting, but we haven't ensured that the commitlog containing these system 
writes has been archived. Again, one way we could achieve this by 
draining/stopping/starting the node before removing archived commitlogs.

> dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog
> --
>
> Key: CASSANDRA-11811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11811
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/416/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog
> Failed on CassCI build trunk_dtest_win32 #416
> Relevant error is pasted. This is clearly a test problem. No idea why it only 
> happens on windows, as of yet. Affecting most tests in the 
> TestArchiveCommitlog suite
> {code}
> WARN: Failed to flush node: node1 on shutdown.
> Unexpected error in node1 log, error: 
> ERROR [main] 2016-05-13 21:15:02,701 CassandraDaemon.java:729 - Fatal 
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
> number of tokens from 64 to 32
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:368) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) 
> [main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11811) dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog

2016-09-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11811:
--
Assignee: DS Test Eng  (was: Branimir Lambov)

> dtest failure in snapshot_test.TestArchiveCommitlog.test_archive_commitlog
> --
>
> Key: CASSANDRA-11811
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11811
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Philip Thompson
>Assignee: DS Test Eng
>  Labels: dtest
> Fix For: 3.x
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest_win32/416/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog
> Failed on CassCI build trunk_dtest_win32 #416
> Relevant error is pasted. This is clearly a test problem. No idea why it only 
> happens on windows, as of yet. Affecting most tests in the 
> TestArchiveCommitlog suite
> {code}
> WARN: Failed to flush node: node1 on shutdown.
> Unexpected error in node1 log, error: 
> ERROR [main] 2016-05-13 21:15:02,701 CassandraDaemon.java:729 - Fatal 
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change the 
> number of tokens from 64 to 32
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1043)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:625)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:368) 
> [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:583)
>  [main/:na]
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:712) 
> [main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12688) Change -ea comment in jvm.options

2016-09-22 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-12688:
---
   Resolution: Fixed
Fix Version/s: 3.10
   Status: Resolved  (was: Patch Available)

Committed as {{f92f959eb319a04f6f2ae876f1e621383b95}}

> Change -ea comment in jvm.options
> -
>
> Key: CASSANDRA-12688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12688
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 3.10
>
>
> Config file does nothing to indicate dangers of turning -ea off. Based on 
> recent ML comments better not to dangle a carrot of 5% performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: CASSANDRA-12688 strongly suggest leaving -ea on

2016-09-22 Thread jjirsa
Repository: cassandra
Updated Branches:
  refs/heads/trunk 703506c3c -> f92f959eb


CASSANDRA-12688 strongly suggest leaving -ea on


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f92f959e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f92f959e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f92f959e

Branch: refs/heads/trunk
Commit: f92f959eb319a04f6f2ae876f1e621383b95
Parents: 703506c
Author: Edward Capriolo 
Authored: Thu Sep 22 12:30:22 2016 -0400
Committer: Jeff Jirsa 
Committed: Thu Sep 22 10:39:47 2016 -0700

--
 conf/jvm.options | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f92f959e/conf/jvm.options
--
diff --git a/conf/jvm.options b/conf/jvm.options
index 9e13e0e..0e329d6 100644
--- a/conf/jvm.options
+++ b/conf/jvm.options
@@ -84,8 +84,7 @@
 # GENERAL JVM SETTINGS #
 
 
-# enable assertions.  disabling this in production will give a modest
-# performance benefit (around 5%).
+# enable assertions. highly suggested for correct application functionality.
 -ea
 
 # enable thread priorities, primarily so we can give periodic tasks



[jira] [Updated] (CASSANDRA-12678) dtest failure in pushed_notifications_test.TestPushedNotifications.restart_node_test

2016-09-22 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-12678:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The dtest fix was 
[committed|https://github.com/riptano/cassandra-dtest/commit/8723a1eb26539ab5c67fd3472dfc85b89807c71b]
 so closing.

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.restart_node_test
> 
>
> Key: CASSANDRA-12678
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12678
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sean McCarthy
>Assignee: Sam Tunnicliffe
>  Labels: dtest
> Attachments: node1.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/275/testReport/pushed_notifications_test/TestPushedNotifications/restart_node_test
> {code}
> Error Message
> 'UP' != u'NEW_NODE'
> {code}
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/pushed_notifications_test.py", line 
> 185, in restart_node_test
> self.assertEquals("UP", notifications[1]["change_type"])
>   File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
> assertion_func(first, second, msg=msg)
>   File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
> raise self.failureException(msg)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12050) per-patch smoke suites as an early/fast testing tier

2016-09-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch updated CASSANDRA-12050:
---
Priority: Minor  (was: Major)

> per-patch smoke suites as an early/fast testing tier
> 
>
> Key: CASSANDRA-12050
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12050
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Russ Hatch
>Priority: Minor
>
> Coverage data offers a unique opportunity to build metadata about tests and 
> related Cassandra code.
> Using the jacoco coverage api we should be able to build a simple index of 
> tests (dtest, unit) to the Cassandra code they touch (and the reverse).
> When a new patch is introduced, we do a lookup in that index, based on java 
> source files touched (and possibly lines within those files), and use that to 
> infer most relevant dtests or unit tests. Patch authors can then run that 
> small test subset as a first testing pass.
> In this way we can build small, focused, test suites that are custom to each 
> patch. Once this small custom smoke test appears successful things would of 
> course need to be vetted across a more complete test run on CI.
> I think the best interface would simply be ant targets. One target would be 
> used to build/refresh the test:source code index (run occasionally and saved 
> somewhere; index building would be time consuming since it will require full 
> job runs). A second target looks at files touched and does the index lookups, 
> then outputs a list of tests to run.
> The dev user experience might looking something like this:
> {noformat}
> ant get-custom-smoke -Dcoverage_index=./trunk_coverage_index_SHA_foo.idx
> Generating test lists
> Please run the following two scripts to vet your changes:
> ./custom_smoke_junit.sh
> ./custom_smoke_dtest.sh
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12050) per-patch smoke suites as an early/fast testing tier

2016-09-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch updated CASSANDRA-12050:
---
Assignee: (was: Russ Hatch)

> per-patch smoke suites as an early/fast testing tier
> 
>
> Key: CASSANDRA-12050
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12050
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Russ Hatch
>
> Coverage data offers a unique opportunity to build metadata about tests and 
> related Cassandra code.
> Using the jacoco coverage api we should be able to build a simple index of 
> tests (dtest, unit) to the Cassandra code they touch (and the reverse).
> When a new patch is introduced, we do a lookup in that index, based on java 
> source files touched (and possibly lines within those files), and use that to 
> infer most relevant dtests or unit tests. Patch authors can then run that 
> small test subset as a first testing pass.
> In this way we can build small, focused, test suites that are custom to each 
> patch. Once this small custom smoke test appears successful things would of 
> course need to be vetted across a more complete test run on CI.
> I think the best interface would simply be ant targets. One target would be 
> used to build/refresh the test:source code index (run occasionally and saved 
> somewhere; index building would be time consuming since it will require full 
> job runs). A second target looks at files touched and does the index lookups, 
> then outputs a list of tests to run.
> The dev user experience might looking something like this:
> {noformat}
> ant get-custom-smoke -Dcoverage_index=./trunk_coverage_index_SHA_foo.idx
> Generating test lists
> Please run the following two scripts to vet your changes:
> ./custom_smoke_junit.sh
> ./custom_smoke_dtest.sh
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12050) per-patch smoke suites as an early/fast testing tier

2016-09-22 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch updated CASSANDRA-12050:
---
Issue Type: Test  (was: Improvement)

> per-patch smoke suites as an early/fast testing tier
> 
>
> Key: CASSANDRA-12050
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12050
> Project: Cassandra
>  Issue Type: Test
>Reporter: Russ Hatch
>Priority: Minor
>
> Coverage data offers a unique opportunity to build metadata about tests and 
> related Cassandra code.
> Using the jacoco coverage api we should be able to build a simple index of 
> tests (dtest, unit) to the Cassandra code they touch (and the reverse).
> When a new patch is introduced, we do a lookup in that index, based on java 
> source files touched (and possibly lines within those files), and use that to 
> infer most relevant dtests or unit tests. Patch authors can then run that 
> small test subset as a first testing pass.
> In this way we can build small, focused, test suites that are custom to each 
> patch. Once this small custom smoke test appears successful things would of 
> course need to be vetted across a more complete test run on CI.
> I think the best interface would simply be ant targets. One target would be 
> used to build/refresh the test:source code index (run occasionally and saved 
> somewhere; index building would be time consuming since it will require full 
> job runs). A second target looks at files touched and does the index lookups, 
> then outputs a list of tests to run.
> The dev user experience might looking something like this:
> {noformat}
> ant get-custom-smoke -Dcoverage_index=./trunk_coverage_index_SHA_foo.idx
> Generating test lists
> Please run the following two scripts to vet your changes:
> ./custom_smoke_junit.sh
> ./custom_smoke_dtest.sh
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12689) All MutationStage threads blocked, kills server

2016-09-22 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513841#comment-15513841
 ] 

Benjamin Roth commented on CASSANDRA-12689:
---

As a graph this may look like this: https://cl.ly/0N3l0D1v1P1H
You can see the mutations increase linearly. The drop was always after having 
restarted C*.
This is just an example, this scenarios happened much more often.

This is the load graph of the same time window: https://cl.ly/2m1S2K081o3n

> All MutationStage threads blocked, kills server
> ---
>
> Key: CASSANDRA-12689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benjamin Roth
>Priority: Critical
>
> Under heavy load (e.g. due to repair during normal operations), a lot of 
> NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
> very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
> exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are 
> completely blocked. This leads to piling up pending tasks until server runs 
> OOM and is completely unresponsive due to GC. Threads will NEVER unblock 
> until server restart. Even if load goes completely down, all hints are 
> paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy 
> load, but tasks should be processed and dequeud after load goes down. This is 
> definitively not the case. This looks more like a an unhandled exception 
> leading to a stuck lock.
> Stack trace from jconsole, all Threads in MutationStage show same trace.
> Name: MutationStage-48
> State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
> Total blocked: 137  Total waited: 138.513
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
> org.apache.cassandra.hints.Hint.apply(Hint.java:96)
> org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12689) All MutationStage threads blocked, kills server

2016-09-22 Thread Benjamin Roth (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Roth updated CASSANDRA-12689:
--
Summary: All MutationStage threads blocked, kills server  (was: Alle 
MutationStage threads blocked, kills server)

> All MutationStage threads blocked, kills server
> ---
>
> Key: CASSANDRA-12689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Benjamin Roth
>Priority: Critical
>
> Under heavy load (e.g. due to repair during normal operations), a lot of 
> NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
> very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
> exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are 
> completely blocked. This leads to piling up pending tasks until server runs 
> OOM and is completely unresponsive due to GC. Threads will NEVER unblock 
> until server restart. Even if load goes completely down, all hints are 
> paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy 
> load, but tasks should be processed and dequeud after load goes down. This is 
> definitively not the case. This looks more like a an unhandled exception 
> leading to a stuck lock.
> Stack trace from jconsole, all Threads in MutationStage show same trace.
> Name: MutationStage-48
> State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
> Total blocked: 137  Total waited: 138.513
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
> org.apache.cassandra.hints.Hint.apply(Hint.java:96)
> org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12689) Alle MutationStage threads blocked, kills server

2016-09-22 Thread Benjamin Roth (JIRA)
Benjamin Roth created CASSANDRA-12689:
-

 Summary: Alle MutationStage threads blocked, kills server
 Key: CASSANDRA-12689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
Reporter: Benjamin Roth
Priority: Critical


Under heavy load (e.g. due to repair during normal operations), a lot of 
NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
very chatty, trace is missing:
2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
exception on thread Thread[MutationStage-1,5,main]: {}
2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null

Then, after some time, in most cases ALL threads in MutationStage pools are 
completely blocked. This leads to piling up pending tasks until server runs OOM 
and is completely unresponsive due to GC. Threads will NEVER unblock until 
server restart. Even if load goes completely down, all hints are paused, and no 
compaction or repair is running. Only restart helps.

I can understand that pending tasks in MutationStage may pile up under heavy 
load, but tasks should be processed and dequeud after load goes down. This is 
definitively not the case. This looks more like a an unhandled exception 
leading to a stuck lock.

Stack trace from jconsole, all Threads in MutationStage show same trace.

Name: MutationStage-48
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
Total blocked: 137  Total waited: 138.513

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.hints.Hint.apply(Hint.java:96)
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91)
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12685) Add retry to hints dispatcher

2016-09-22 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513798#comment-15513798
 ] 

Dikang Gu commented on CASSANDRA-12685:
---

[~spo...@gmail.com] thanks for the replay, yes, the logs are from 2.x. Maybe I 
missed something,  I did not find any retry logic even in 3.x branch or trunk, 
do you mind to point me to it?

> Add retry to hints dispatcher
> -
>
> Key: CASSANDRA-12685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12685
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> Problem: I often see timeout in hints replay, I find there is no retry for 
> hints replay, I think it would be great to add some retry logic for timeout 
> exception.
> {code}
> 2016-09-20_07:32:01.16610 INFO  07:32:01 [HintedHandoff:3]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_07:58:49.29983 INFO  07:58:49 [HintedHandoff:3]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (55040 delivered)
> 2016-09-20_07:58:49.29984 INFO  07:58:49 [HintedHandoff:3]: Enqueuing flush 
> of hints: 15962349 (0%) on-heap, 2049808 (0%) off-heap
> 2016-09-20_08:02:17.55072 INFO  08:02:17 [HintedHandoff:1]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:05:45.25723 INFO  08:05:45 [HintedHandoff:1]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (7936 delivered)
> 2016-09-20_08:05:45.25725 INFO  08:05:45 [HintedHandoff:1]: Enqueuing flush 
> of hints: 2301605 (0%) on-heap, 259744 (0%) off-heap
> 2016-09-20_08:12:19.92910 INFO  08:12:19 [HintedHandoff:2]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:51:44.72191 INFO  08:51:44 [HintedHandoff:2]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (83456 delivered)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12688) Change -ea comment in jvm.options

2016-09-22 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-12688:

Status: Patch Available  (was: Open)

> Change -ea comment in jvm.options
> -
>
> Key: CASSANDRA-12688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12688
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>
> Config file does nothing to indicate dangers of turning -ea off. Based on 
> recent ML comments better not to dangle a carrot of 5% performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12685) Add retry to hints dispatcher

2016-09-22 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-12685:
--
Since Version: 2.1.14

> Add retry to hints dispatcher
> -
>
> Key: CASSANDRA-12685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12685
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> Problem: I often see timeout in hints replay, I find there is no retry for 
> hints replay, I think it would be great to add some retry logic for timeout 
> exception.
> {code}
> 2016-09-20_07:32:01.16610 INFO  07:32:01 [HintedHandoff:3]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_07:58:49.29983 INFO  07:58:49 [HintedHandoff:3]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (55040 delivered)
> 2016-09-20_07:58:49.29984 INFO  07:58:49 [HintedHandoff:3]: Enqueuing flush 
> of hints: 15962349 (0%) on-heap, 2049808 (0%) off-heap
> 2016-09-20_08:02:17.55072 INFO  08:02:17 [HintedHandoff:1]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:05:45.25723 INFO  08:05:45 [HintedHandoff:1]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (7936 delivered)
> 2016-09-20_08:05:45.25725 INFO  08:05:45 [HintedHandoff:1]: Enqueuing flush 
> of hints: 2301605 (0%) on-heap, 259744 (0%) off-heap
> 2016-09-20_08:12:19.92910 INFO  08:12:19 [HintedHandoff:2]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:51:44.72191 INFO  08:51:44 [HintedHandoff:2]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (83456 delivered)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12688) Change -ea comment in jvm.options

2016-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513750#comment-15513750
 ] 

Edward Capriolo commented on CASSANDRA-12688:
-

https://github.com/apache/cassandra/compare/trunk...edwardcapriolo:CASSANDRA-12688?expand=1

> Change -ea comment in jvm.options
> -
>
> Key: CASSANDRA-12688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12688
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
>
> Config file does nothing to indicate dangers of turning -ea off. Based on 
> recent ML comments better not to dangle a carrot of 5% performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12688) Change -ea comment in jvm.options

2016-09-22 Thread Edward Capriolo (JIRA)
Edward Capriolo created CASSANDRA-12688:
---

 Summary: Change -ea comment in jvm.options
 Key: CASSANDRA-12688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12688
 Project: Cassandra
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor


Config file does nothing to indicate dangers of turning -ea off. Based on 
recent ML comments better not to dangle a carrot of 5% performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12148) Improve determinism of CDC data availability

2016-09-22 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513657#comment-15513657
 ] 

Jim Witschey commented on CASSANDRA-12148:
--

Seeing a couple failures in the new test. Test report:

http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/

We see [some CDC index files created in the destination 
node|http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/node_0_iter_001.cdc_test/TestCDC/test_cdc_data_available_in_cdc_raw/],
 and [a case where the CDC segments are, unexpectedly, different after writing 
non-CDC 
data|http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/326/testReport/node_0_iter_001.cdc_test/TestCDC/test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space/].
 The second could be a bad expectation now that availability behavior has 
changed, I'm not sure.

> Improve determinism of CDC data availability
> 
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due 
> to our reliance on CommitLogSegments being discarded to have the data 
> available in cdc_raw: if a slowly written table co-habitates a 
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until 
> we hit either memory pressure on memtables or CommitLog limit pressure. 
> Ultimately, this leaves a non-deterministic element to when data becomes 
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more 
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead 
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file 
> per commit log segment in cdc_raw. Clients tail this index file, delta their 
> local last parsed offset on change, and parse the corresponding commit log 
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed 
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled 
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the 
> last written CDC mutation in the segment if higher than the previously known 
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest 
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as 
> _cdc.idx containing the min offset and end offset fsynced to 
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the 
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the 
> segment in commitlog and update the _cdc.idx file w/end offset 
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled 
> mutations from a segment and write that to _cdc.idx on 
> completion of segment replay. This should bridge the potential correctness 
> gap of a node writing to a segment and then dying before it can write the 
> _cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC 
> mutation, track an offset of how far they've parsed, delta against the 
> _cdc.idx file end offset, and use that as a determinant on when to parse new 
> CDC data. Any existing clients written to the initial implementation of CDC 
> need only add the _cdc.idx logic and checking for DONE marker 
> to their code, so the burden on users to update to support this should be 
> quite small for the benefit of having data available as soon as it's fsynced 
> instead of at a non-deterministic time when potentially unrelated tables are 
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be 
> more friendly for realtime parsing, perhaps supporting taking a 
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls, 
> assuming the reader is at the start of a SyncSegment. Would probably also 
> need to rewind to the start of the segment before returning so subsequent 
> calls would respect this contract. This would skip needing to deserialize the 
> descriptor and all 

[jira] [Updated] (CASSANDRA-10825) OverloadedException is untested

2016-09-22 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-10825:

Attachment: jmx-hint.png

> OverloadedException is untested
> ---
>
> Key: CASSANDRA-10825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10825
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Ariel Weisberg
>Assignee: Edward Capriolo
> Attachments: jmx-hint.png
>
>
> If you grep test/src and cassandra-dtest you will find that the string 
> OverloadedException doesn't appear anywhere.
> In CASSANDRA-10477 it was found that there were cases where Paxos should 
> back-pressure and throw OverloadedException but didn't.
> If OverloadedException is used for functional purposes then we should test 
> that it is thrown under expected conditions. If there are behaviors driven by 
> catching or tracking OverloadedException we should test those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10825) OverloadedException is untested

2016-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513613#comment-15513613
 ] 

Edward Capriolo commented on CASSANDRA-10825:
-

[~tjake] [~iamaleksey] Thanks for looking. Before I rebase for two branches I 
made some minor changes.

* Switched from AtomicLong to counter
* Move the structure into the StorageMetrics class
* During the exception captured the stats to method variables, the way it was 
written there is moments of time between data collection and message printing

Let me know if you like these changes.



> OverloadedException is untested
> ---
>
> Key: CASSANDRA-10825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10825
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Ariel Weisberg
>Assignee: Edward Capriolo
>
> If you grep test/src and cassandra-dtest you will find that the string 
> OverloadedException doesn't appear anywhere.
> In CASSANDRA-10477 it was found that there were cases where Paxos should 
> back-pressure and throw OverloadedException but didn't.
> If OverloadedException is used for functional purposes then we should test 
> that it is thrown under expected conditions. If there are behaviors driven by 
> catching or tracking OverloadedException we should test those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12686) Communicate timeouts and other driver relevant options in SUPPORTED response or some other mechanism

2016-09-22 Thread Andy Tolbert (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Tolbert updated CASSANDRA-12686:
-
Labels: client-impacting  (was: )

> Communicate timeouts and other driver relevant options in SUPPORTED response 
> or some other mechanism
> 
>
> Key: CASSANDRA-12686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12686
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andy Tolbert
>  Labels: client-impacting
>
> It would be really useful if driver clients had a mechanism to understand 
> what the configured timeouts on the C* side are.
> Ideally a driver should be configured in such a way that it's client timeout 
> is greater than the C* timeouts ({{write_request_timeout_in_ms}}, 
> {{read_request_timeout_in_ms}}, etc.) so its retry policy may make the 
> appropriate decision based on the kind of timeout received from cassandra.   
> This is why most driver clients have a client timeout of 12 seconds.   If the 
> client knew the server timeouts, it could adjust its client timeout 
> accordingly.
> At the moment, the only place I think where this could be communicated is 
> through a {{SUPPORTED}} message when the client sends an {{OPTIONS}} message, 
> but that could be viewed as awkward.  Also consider that some clients use the 
> {{OPTIONS}} message as a form of heartbeat, so adding more to a {{SUPPORTED}} 
> message could add some (likely trivial) data on the wire between server and 
> client.
> Alternatively, it could also be interesting if the client could configure the 
> timeout on the server end (with some ceiling set by C*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12687) Protocol level heartbeat

2016-09-22 Thread Andy Tolbert (JIRA)
Andy Tolbert created CASSANDRA-12687:


 Summary: Protocol level heartbeat
 Key: CASSANDRA-12687
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12687
 Project: Cassandra
  Issue Type: Improvement
Reporter: Andy Tolbert


Most of the datastax drivers use the {{OPTIONS}} message as a means of doing a 
protocol level heartbeat, to which the server responds with a {{SUPPORTED}} 
message (see: http://datastax.github.io/java-driver/manual/pooling/#heartbeat). 
 

It would be great if there was a simple {{HEARTBEAT}} message type that could 
be sent in either direction to indicate that the other side of a connection is 
still responding at a protocol level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12686) Communicate timeouts and other driver relevant options in SUPPORTED response or some other mechanism

2016-09-22 Thread Andy Tolbert (JIRA)
Andy Tolbert created CASSANDRA-12686:


 Summary: Communicate timeouts and other driver relevant options in 
SUPPORTED response or some other mechanism
 Key: CASSANDRA-12686
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12686
 Project: Cassandra
  Issue Type: Improvement
Reporter: Andy Tolbert


It would be really useful if driver clients had a mechanism to understand what 
the configured timeouts on the C* side are.

Ideally a driver should be configured in such a way that it's client timeout is 
greater than the C* timeouts ({{write_request_timeout_in_ms}}, 
{{read_request_timeout_in_ms}}, etc.) so its retry policy may make the 
appropriate decision based on the kind of timeout received from cassandra.   
This is why most driver clients have a client timeout of 12 seconds.   If the 
client knew the server timeouts, it could adjust its client timeout accordingly.

At the moment, the only place I think where this could be communicated is 
through a {{SUPPORTED}} message when the client sends an {{OPTIONS}} message, 
but that could be viewed as awkward.  Also consider that some clients use the 
{{OPTIONS}} message as a form of heartbeat, so adding more to a {{SUPPORTED}} 
message could add some (likely trivial) data on the wire between server and 
client.

Alternatively, it could also be interesting if the client could configure the 
timeout on the server end (with some ceiling set by C*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12678) dtest failure in pushed_notifications_test.TestPushedNotifications.restart_node_test

2016-09-22 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-12678:

Reviewer: Philip Thompson

> dtest failure in 
> pushed_notifications_test.TestPushedNotifications.restart_node_test
> 
>
> Key: CASSANDRA-12678
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12678
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sean McCarthy
>Assignee: Sam Tunnicliffe
>  Labels: dtest
> Attachments: node1.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/275/testReport/pushed_notifications_test/TestPushedNotifications/restart_node_test
> {code}
> Error Message
> 'UP' != u'NEW_NODE'
> {code}
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/pushed_notifications_test.py", line 
> 185, in restart_node_test
> self.assertEquals("UP", notifications[1]["change_type"])
>   File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
> assertion_func(first, second, msg=msg)
>   File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
> raise self.failureException(msg)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10825) OverloadedException is untested

2016-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513445#comment-15513445
 ] 

Edward Capriolo commented on CASSANDRA-10825:
-

Thanks. I am going to spend a little time here. In particular I want to be sure 
that inc() and dec() always happen. In particular now that this code path is 
active if we inc() and not dec() the system could keep throwing 
OverloadedException. Maybe the right answer is that the counter should decay 
over time or the count should be the OneMinuteRate or something along those 
lines.

> OverloadedException is untested
> ---
>
> Key: CASSANDRA-10825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10825
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Ariel Weisberg
>Assignee: Edward Capriolo
>
> If you grep test/src and cassandra-dtest you will find that the string 
> OverloadedException doesn't appear anywhere.
> In CASSANDRA-10477 it was found that there were cases where Paxos should 
> back-pressure and throw OverloadedException but didn't.
> If OverloadedException is used for functional purposes then we should test 
> that it is thrown under expected conditions. If there are behaviors driven by 
> catching or tracking OverloadedException we should test those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12653) In-flight shadow round requests

2016-09-22 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-12653:
---
Attachment: 12653-2.2.patch
12653-3.0.patch
12653-trunk.patch

> In-flight shadow round requests
> ---
>
> Key: CASSANDRA-12653
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
> Attachments: 12653-2.2.patch, 12653-3.0.patch, 12653-trunk.patch
>
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12653) In-flight shadow round requests

2016-09-22 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-12653:
---
Status: Patch Available  (was: Open)

> In-flight shadow round requests
> ---
>
> Key: CASSANDRA-12653
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12653) In-flight shadow round requests

2016-09-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513406#comment-15513406
 ] 

Stefan Podkowinski commented on CASSANDRA-12653:



The attached patch will solve this issue by using a separate data structure for 
the information gathered during shadow rounds and by using a timestamp for the 
first syn send. 

||trunk||3.0||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-trunk]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-2.2]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-testall/]|


I can also create patches for merge conflicts if needed after agreeing on which 
versions to target.


> In-flight shadow round requests
> ---
>
> Key: CASSANDRA-12653
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Minor
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11218) Prioritize Secondary Index rebuild

2016-09-22 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11218:

Status: Patch Available  (was: Open)

> Prioritize Secondary Index rebuild
> --
>
> Key: CASSANDRA-11218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11218
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Jeff Jirsa
>Priority: Minor
>
> We have seen that secondary index rebuild get stuck behind other compaction 
> during a bootstrap and other operations. This causes things to not finish. We 
> should prioritize index rebuild via a separate thread pool or using a 
> priority queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11218) Prioritize Secondary Index rebuild

2016-09-22 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11218:

Reviewer: Marcus Eriksson

> Prioritize Secondary Index rebuild
> --
>
> Key: CASSANDRA-11218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11218
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Jeff Jirsa
>Priority: Minor
>
> We have seen that secondary index rebuild get stuck behind other compaction 
> during a bootstrap and other operations. This causes things to not finish. We 
> should prioritize index rebuild via a separate thread pool or using a 
> priority queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12461) Add hooks to StorageService shutdown

2016-09-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513021#comment-15513021
 ] 

Alex Petrov edited comment on CASSANDRA-12461 at 9/22/16 12:37 PM:
---

I've discovered several more problems while working on this patch, in the last 
version (from [here|https://github.com/acoz/cassandra/commits/12461]):

  * node drain code was duplicated (with minor differences, which I indicate 
below), as I mentioned
  * it is possible to re-start services after drain which won't run regular 
shutdown path on jvm exit
  * if the node was drained, under Windows the timer resolution (added in 
[CASSANDRA-9634]) was not reset, since the node was considered "already 
drained" (although this existed before already)
  * same was happening with post-shutdown hooks in patch, since 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576]
 we'll return from runnable, since those services were shut down during 
{{drain}} 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589].
 So they wouldn't run at all if {{nodetool drain}} was called.
  * because logging system is shutdown in the post-shutdown hook, we depend on 
the order, although we have to guarantee that logging is available for all 
hooks and avoid any races or having to register hooks at the particular stage.

This is one of the reasons I was suggesting single drain process. 

I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff 
and thrift to ensure that we do not need to re-stop them in the final shutdown 
hook. Operator can not bring the node into "working" state after drain without 
restart anyways (one of the reasons is the fact that commit log is shut down by 
that time), and it was most likely never intended to do so.

I've made a comparison table to make it easier to see what {{drain()}} method 
was doing compared to {{drainOnShutdown}} runnable:

|| nodetool drain || shutdown drain hook |
| disables autocompaction   |   
  |
| shuts down compaction manager | |
| recycles commitlog segment recycling |
 |
| shuts down batchlog and hints earlier |   
  |
|   | flushes only tables with 
durable_writes |
|   | clears set timer resolution for 
windows |

I've combined the two processes, made clearer distinctions to allow running 
things in {{drainOnShutdown}}. Since we can run all the items from the 
{{nodetool drain}} part of the list during the normal node shutdown, the code 
got a bit simpler, too (the only difference is now logging). If this 
granularity is not enough, we have two more options:
  * run post-shutdown hooks directly before the JVM shutdown
  * have 3 stages: pre-, post- drain and pre-jvm shutdown instead

Although I prefer the current way.

Preliminary version of the update (also, CI pending): 
|[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]|

(I've discussed the change "in theory" with [~slebresne], although it's still 
worth for someone to take a deeper look at it, I'll ask around)


was (Author: ifesdjeen):
I've discovered several more problems while working on this patch:

  * node drain code was duplicated (with minor differences, which I indicate 
below)
  * if the node was drained, under Windows the timer resolution (added in 
[CASSANDRA-9634]) was not reset, since the node was considered "already 
drained".
  * same was happening with post-shutdown hooks, since 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576]
 we'll return from runnable, since those services were shut down during 
{{drain}} 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589].
 

This is one of the reasons I was advocating for a single consistent drain 
process. 
I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff 
and thrift to ensure that we do not need to re-stop them in the final shutdown 
hook. Operator can not bring the node into "working" state after drain without 
restart anyways (one of the reasons is the fact that commit log is shut down by 
that time), and it was most likely never intended to do so.

I've made a comparison table to make it easier to see what {{drain()}} method 
was doing compared to 

[jira] [Comment Edited] (CASSANDRA-12461) Add hooks to StorageService shutdown

2016-09-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513021#comment-15513021
 ] 

Alex Petrov edited comment on CASSANDRA-12461 at 9/22/16 11:28 AM:
---

I've discovered several more problems while working on this patch:

  * node drain code was duplicated (with minor differences, which I indicate 
below)
  * if the node was drained, under Windows the timer resolution (added in 
[CASSANDRA-9634]) was not reset, since the node was considered "already 
drained".
  * same was happening with post-shutdown hooks, since 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576]
 we'll return from runnable, since those services were shut down during 
{{drain}} 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589].
 

This is one of the reasons I was advocating for a single consistent drain 
process. 
I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff 
and thrift to ensure that we do not need to re-stop them in the final shutdown 
hook. Operator can not bring the node into "working" state after drain without 
restart anyways (one of the reasons is the fact that commit log is shut down by 
that time), and it was most likely never intended to do so.

I've made a comparison table to make it easier to see what {{drain()}} method 
was doing compared to {{drainOnShutdown}} runnable:

|| nodetool drain || shutdown drain hook |
| disables autocompaction   |   
  |
| shuts down compaction manager | |
| recycles commitlog segment recycling |
 |
| shuts down batchlog and hints earlier |   
  |
|   | flushes only tables with 
durable_writes |
|   | clears set timer resolution for 
windows |

I've combined the two processes, made clearer distinctions to allow running 
things in {{drainOnShutdown}}. Since we can run all the items from the 
{{nodetool drain}} part of the list during the normal node shutdown, the code 
got a bit simpler, too (the only difference is now logging). 

Preliminary version of the update (also, CI pending): 
|[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]|

(I've discussed the change "in theory" with [~slebresne], although it's still 
worth for someone to take a deeper look at it, I'll ask around)


was (Author: ifesdjeen):
I've discovered several more problems while working on this patch:

  * node drain code was duplicated (with minor differences, which I indicate 
below)
  * if the node was drained, under Windows the timer resolution (added in 
[CASSANDRA-9634]) was not reset, since the node was considered "already 
drained".
  * same was happening with post-shutdown hooks, since 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576]
 we'll return from runnable, since those services were shut down during 
{{drain}} 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589].
 

This is one of the reasons I was advocating for a single consistent drain 
process. 
I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff 
and thrift to ensure that we do not need to re-stop them in the final shutdown 
hook. Operator can not bring the node into "working" state after drain without 
restart anyways (one of the reasons is the fact that commit log is shut down by 
that time), and it was most likely never intended to do so. However, it might 
be useful for operator to run compactions on the drained node, so we only shut 
down compaction manager later. 

I've made a comparison table to make it easier to see what {{drain()}} method 
was doing compared to {{drainOnShutdown}} runnable:

|| nodetool drain || shutdown drain hook |
| disables autocompaction   |   
  |
| shuts down compaction manager | |
| recycles commitlog segment recycling |
 |
| shuts down batchlog and hints earlier |   
  |
|   | flushes only tables with 
durable_writes |
|   | clears set timer resolution for 
windows |

I've combined the two processes, made clearer distinctions to allow running 
things in 

[jira] [Commented] (CASSANDRA-12461) Add hooks to StorageService shutdown

2016-09-22 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513021#comment-15513021
 ] 

Alex Petrov commented on CASSANDRA-12461:
-

I've discovered several more problems while working on this patch:

  * node drain code was duplicated (with minor differences, which I indicate 
below)
  * if the node was drained, under Windows the timer resolution (added in 
[CASSANDRA-9634]) was not reset, since the node was considered "already 
drained".
  * same was happening with post-shutdown hooks, since 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576]
 we'll return from runnable, since those services were shut down during 
{{drain}} 
[here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589].
 

This is one of the reasons I was advocating for a single consistent drain 
process. 
I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff 
and thrift to ensure that we do not need to re-stop them in the final shutdown 
hook. Operator can not bring the node into "working" state after drain without 
restart anyways (one of the reasons is the fact that commit log is shut down by 
that time), and it was most likely never intended to do so. However, it might 
be useful for operator to run compactions on the drained node, so we only shut 
down compaction manager later. 

I've made a comparison table to make it easier to see what {{drain()}} method 
was doing compared to {{drainOnShutdown}} runnable:

|| nodetool drain || shutdown drain hook |
| disables autocompaction   |   
  |
| shuts down compaction manager | |
| recycles commitlog segment recycling |
 |
| shuts down batchlog and hints earlier |   
  |
|   | flushes only tables with 
durable_writes |
|   | clears set timer resolution for 
windows |

I've combined the two processes, made clearer distinctions to allow running 
things in {{drainOnShutdown}}. Since we can run all the items from the 
{{nodetool drain}} part of the list during the normal node shutdown, the code 
got a bit simpler, too (the only difference is now logging). 

Preliminary version of the update (also, CI pending): 
|[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]|

(I've discussed the change "in theory" with [~slebresne], although it's still 
worth for someone to take a deeper look at it, I'll ask around)

> Add hooks to StorageService shutdown
> 
>
> Key: CASSANDRA-12461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12461
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anthony Cozzie
>Assignee: Anthony Cozzie
> Fix For: 3.x
>
> Attachments: 
> 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch
>
>
> The JVM will usually run shutdown hooks in parallel.  This can lead to 
> synchronization problems between Cassandra, services that depend on it, and 
> services it depends on.  This patch adds some simple support for shutdown 
> hooks to StorageService.
> This should nearly solve CASSANDRA-12011



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12582) Removing static column results in ReadFailure due to CorruptSSTableException

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512745#comment-15512745
 ] 

Stefania commented on CASSANDRA-12582:
--

The reason of the corruption is that the sstable iterators try to read a static 
column as a regular column. This is due to the fact that the serialization 
header is missing the dropped static column.

When the header is deserialized, it relies on {{CFMetadata}} to provide a fake 
dropped column. The problem is that {{CFMetadata}} always assumes that dropped 
columns are regular, see 
[here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/config/CFMetaData.java#L682].
 Then when the iterators read the static column, they rely on the header to 
make the decision of whether a static column is present or not, 
[here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L169].
 As a consequence, they don't attempt to skip the static column and they try to 
read it as a regular column later on.

> Removing static column results in ReadFailure due to CorruptSSTableException
> 
>
> Key: CASSANDRA-12582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Cassandra 3.0.8
>Reporter: Evan Prothro
>Assignee: Stefania
>Priority: Critical
>  Labels: compaction, corruption, drop, read, static
> Fix For: 3.0.x, 3.x
>
> Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  

[jira] [Updated] (CASSANDRA-12186) anticompaction log message doesn't include the parent repair session id

2016-09-22 Thread Tommy Stendahl (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl updated CASSANDRA-12186:
---
Fix Version/s: 3.x
   Status: Patch Available  (was: Open)

> anticompaction log message doesn't include the parent repair session id
> ---
>
> Key: CASSANDRA-12186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12186
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Wei Deng
>Assignee: Tommy Stendahl
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 12186.txt
>
>
> It appears that even though incremental repair is now enabled by default post 
> C*-3.0 (which means at the end of each repair session, there is an 
> anti-compaction step that needs to be executed), we don't include the parent 
> repair session UUID in the log message of the anti-compaction log entries. 
> This makes observing all activities related to an incremental repair session 
> to be more difficult. See the following:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
> RepairMessageVerbHandler.java:149 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> 
> <...>
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
> CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest 
> on 
> 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
>  sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
> CompactionManager.java:540 - SSTable 
> BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
>  fully contained in range (-9223372036854775808,-9223372036854775808], 
> mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
> CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> The initial submission of the anti-compaction task to the CompactionManager 
> still has reference to the parent repair session UUID, but subsequent 
> anti-compaction log entries are missing this parent repair session UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12186) anticompaction log message doesn't include the parent repair session id

2016-09-22 Thread Tommy Stendahl (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl updated CASSANDRA-12186:
---
Attachment: 12186.txt

> anticompaction log message doesn't include the parent repair session id
> ---
>
> Key: CASSANDRA-12186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12186
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Wei Deng
>Assignee: Tommy Stendahl
>Priority: Minor
>  Labels: lhf
> Attachments: 12186.txt
>
>
> It appears that even though incremental repair is now enabled by default post 
> C*-3.0 (which means at the end of each repair session, there is an 
> anti-compaction step that needs to be executed), we don't include the parent 
> repair session UUID in the log message of the anti-compaction log entries. 
> This makes observing all activities related to an incremental repair session 
> to be more difficult. See the following:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
> RepairMessageVerbHandler.java:149 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> 
> <...>
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
> CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest 
> on 
> 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
>  sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
> CompactionManager.java:540 - SSTable 
> BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
>  fully contained in range (-9223372036854775808,-9223372036854775808], 
> mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
> CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> The initial submission of the anti-compaction task to the CompactionManager 
> still has reference to the parent repair session UUID, but subsequent 
> anti-compaction log entries are missing this parent repair session UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12186) anticompaction log message doesn't include the parent repair session id

2016-09-22 Thread Tommy Stendahl (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512667#comment-15512667
 ] 

Tommy Stendahl commented on CASSANDRA-12186:


I created a small patch for this issue, it just adds the parent repair session 
UUID to the "Started" and "Completed" log entry. The example above with my 
patch would be:

{noformat}
DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
RepairMessageVerbHandler.java:149 - Got anticompaction request 
AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2} 
org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
<...>

<...>
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest on 
1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
 sstables, parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
CompactionManager.java:540 - SSTable 
BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
 fully contained in range (-9223372036854775808,-9223372036854775808], mutating 
repairedAt instead of anticompacting
INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
CompactionManager.java:578 - Completed anticompaction successfully, 
parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2
{noformat}



> anticompaction log message doesn't include the parent repair session id
> ---
>
> Key: CASSANDRA-12186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12186
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Wei Deng
>Priority: Minor
>  Labels: lhf
>
> It appears that even though incremental repair is now enabled by default post 
> C*-3.0 (which means at the end of each repair session, there is an 
> anti-compaction step that needs to be executed), we don't include the parent 
> repair session UUID in the log message of the anti-compaction log entries. 
> This makes observing all activities related to an incremental repair session 
> to be more difficult. See the following:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
> RepairMessageVerbHandler.java:149 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> 
> <...>
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
> CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest 
> on 
> 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
>  sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
> CompactionManager.java:540 - SSTable 
> BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
>  fully contained in range (-9223372036854775808,-9223372036854775808], 
> mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
> CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> The initial submission of the anti-compaction task to the CompactionManager 
> still has reference to the parent repair session UUID, but subsequent 
> anti-compaction log entries are missing this parent repair session UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-12186) anticompaction log message doesn't include the parent repair session id

2016-09-22 Thread Tommy Stendahl (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl reassigned CASSANDRA-12186:
--

Assignee: Tommy Stendahl

> anticompaction log message doesn't include the parent repair session id
> ---
>
> Key: CASSANDRA-12186
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12186
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Wei Deng
>Assignee: Tommy Stendahl
>Priority: Minor
>  Labels: lhf
>
> It appears that even though incremental repair is now enabled by default post 
> C*-3.0 (which means at the end of each repair session, there is an 
> anti-compaction step that needs to be executed), we don't include the parent 
> repair session UUID in the log message of the anti-compaction log entries. 
> This makes observing all activities related to an incremental repair session 
> to be more difficult. See the following:
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-07-13 01:57:30,956  
> RepairMessageVerbHandler.java:149 - Got anticompaction request 
> AnticompactionRequest{parentRepairSession=27103de0-489d-11e6-a6d6-cd06faa0aaa2}
>  org.apache.cassandra.repair.messages.AnticompactionRequest@34449ff4
> <...>
> 
> <...>
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,512  
> CompactionManager.java:511 - Starting anticompaction for trivial_ks.weitest 
> on 
> 1/[BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')]
>  sstables
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,513  
> CompactionManager.java:540 - SSTable 
> BigTableReader(path='/var/lib/cassandra/data/trivial_ks/weitest-538b07d1489b11e6a9ef61c6ff848952/mb-1-big-Data.db')
>  fully contained in range (-9223372036854775808,-9223372036854775808], 
> mutating repairedAt instead of anticompacting
> INFO  [CompactionExecutor:5] 2016-07-13 02:07:47,570  
> CompactionManager.java:578 - Completed anticompaction successfully
> {noformat}
> The initial submission of the anti-compaction task to the CompactionManager 
> still has reference to the parent repair session UUID, but subsequent 
> anti-compaction log entries are missing this parent repair session UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12582) Removing static column results in ReadFailure due to CorruptSSTableException

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512573#comment-15512573
 ] 

Stefania commented on CASSANDRA-12582:
--

Reproduced without problems.

> Removing static column results in ReadFailure due to CorruptSSTableException
> 
>
> Key: CASSANDRA-12582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Cassandra 3.0.8
>Reporter: Evan Prothro
>Assignee: Stefania
>Priority: Critical
>  Labels: compaction, corruption, drop, read, static
> Fix For: 3.0.x, 3.x
>
> Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> 

[jira] [Commented] (CASSANDRA-12582) Removing static column results in ReadFailure due to CorruptSSTableException

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512556#comment-15512556
 ] 

Stefania commented on CASSANDRA-12582:
--

 I'll have a go at reproducing this and see if I can understand what is going 
on. Thanks for providing such detailed information.

> Removing static column results in ReadFailure due to CorruptSSTableException
> 
>
> Key: CASSANDRA-12582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Cassandra 3.0.8
>Reporter: Evan Prothro
>Assignee: Stefania
>Priority: Critical
>  Labels: compaction, corruption, drop, read, static
> Fix For: 3.0.x, 3.x
>
> Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  

[jira] [Assigned] (CASSANDRA-12582) Removing static column results in ReadFailure due to CorruptSSTableException

2016-09-22 Thread Stefania (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania reassigned CASSANDRA-12582:


Assignee: Stefania

> Removing static column results in ReadFailure due to CorruptSSTableException
> 
>
> Key: CASSANDRA-12582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: Cassandra 3.0.8
>Reporter: Evan Prothro
>Assignee: Stefania
>Priority: Critical
>  Labels: compaction, corruption, drop, read, static
> Fix For: 3.0.x, 3.x
>
> Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  

[jira] [Commented] (CASSANDRA-12685) Add retry to hints dispatcher

2016-09-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512544#comment-15512544
 ] 

Stefan Podkowinski commented on CASSANDRA-12685:


The provided logs seems to be based on 2.x, as hint dispatching has been 
changed in 3.0 as part of CASSANDRA-6230. Retry semantics have been implemented 
as well and should have handled the situation you described.

> Add retry to hints dispatcher
> -
>
> Key: CASSANDRA-12685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12685
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Minor
> Fix For: 3.x
>
>
> Problem: I often see timeout in hints replay, I find there is no retry for 
> hints replay, I think it would be great to add some retry logic for timeout 
> exception.
> {code}
> 2016-09-20_07:32:01.16610 INFO  07:32:01 [HintedHandoff:3]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_07:58:49.29983 INFO  07:58:49 [HintedHandoff:3]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (55040 delivered)
> 2016-09-20_07:58:49.29984 INFO  07:58:49 [HintedHandoff:3]: Enqueuing flush 
> of hints: 15962349 (0%) on-heap, 2049808 (0%) off-heap
> 2016-09-20_08:02:17.55072 INFO  08:02:17 [HintedHandoff:1]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:05:45.25723 INFO  08:05:45 [HintedHandoff:1]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (7936 delivered)
> 2016-09-20_08:05:45.25725 INFO  08:05:45 [HintedHandoff:1]: Enqueuing flush 
> of hints: 2301605 (0%) on-heap, 259744 (0%) off-heap
> 2016-09-20_08:12:19.92910 INFO  08:12:19 [HintedHandoff:2]: Started hinted 
> handoff for host: 859af100-5d45-42bd-92f5-2bc78822158b with IP: 
> /2401:db00:12:30d7:face:0:39:0
> 2016-09-20_08:51:44.72191 INFO  08:51:44 [HintedHandoff:2]: Timed out 
> replaying hints to /2401:db00:12:30d7:face:0:39:0; aborting (83456 delivered)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12605) Timestamp-order searching of sstables does not handle non-frozen UDTs, frozen collections correctly

2016-09-22 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512472#comment-15512472
 ] 

Benjamin Lerer commented on CASSANDRA-12605:


The patch looks good to me. Thanks.


> Timestamp-order searching of sstables does not handle non-frozen UDTs, frozen 
> collections correctly
> ---
>
> Key: CASSANDRA-12605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12605
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>
> {{SinglePartitionReadCommand.queryNeitherCountersNorCollections()}} is used 
> to determine whether we can search sstables in timestamp order.  We cannot 
> use this optimization when there are multicell values (such as unfrozen 
> collections or UDTs).  However, this method only checks 
> {{column.type.isCollection() || column.type.isCounter()}}.  Instead, it 
> should check {{column.type.isMulticell() || column.type.isCounter()}}.
> This has two implications:
> * We are using timestamp-order searching when querying non-frozen UDTs, which 
> can lead to incorrect/stale results being returned.
> * We are not taking advantage of this optimization when querying frozen 
> collections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12605) Timestamp-order searching of sstables does not handle non-frozen UDTs, frozen collections correctly

2016-09-22 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-12605:
---
Status: Ready to Commit  (was: Patch Available)

> Timestamp-order searching of sstables does not handle non-frozen UDTs, frozen 
> collections correctly
> ---
>
> Key: CASSANDRA-12605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12605
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>
> {{SinglePartitionReadCommand.queryNeitherCountersNorCollections()}} is used 
> to determine whether we can search sstables in timestamp order.  We cannot 
> use this optimization when there are multicell values (such as unfrozen 
> collections or UDTs).  However, this method only checks 
> {{column.type.isCollection() || column.type.isCounter()}}.  Instead, it 
> should check {{column.type.isMulticell() || column.type.isCounter()}}.
> This has two implications:
> * We are using timestamp-order searching when querying non-frozen UDTs, which 
> can lead to incorrect/stale results being returned.
> * We are not taking advantage of this optimization when querying frozen 
> collections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12571) cqlsh lost the ability to have a request wait indefinitely

2016-09-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512357#comment-15512357
 ] 

Stefania edited comment on CASSANDRA-12571 at 9/22/16 6:46 AM:
---

This behavior was changed by CASSANDRA-10686 in all versions since 2.1.13. 
[~pauloricardomg] was there a reason to remove this functionality and should 
the doc be updated or should we re-instate this behavior? 

IMO it's better to use a large timeout rather than None and I would be inclined 
to update the documentation.


was (Author: stefania):
This behavior was changed by CASSANDRA-10686 in all versions since 2.1.13. 
[~pauloricardomg] was there a reason to remove this functionality and should 
the doc be updated or or can we re-instate this behavior? 

IMO it's better to use a large timeout rather than None and I would be inclined 
to update the documentation.

> cqlsh lost the ability to have a request wait indefinitely
> --
>
> Key: CASSANDRA-12571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 3.7
>Reporter: Nate Sanders
>Assignee: Stefania
>Priority: Minor
>
> In commit c7f0032912798b5e53b64d8391e3e3d7e4121165, when client_timeout 
> became request_timeout, the logic was changed so that you can no longer use a 
> timeout of None, despite the docs saying that you can:
> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html#cqlshUsingCqlshrc__request-timeout



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >