[jira] [Commented] (CASSANDRA-14328) Invalid metadata has been detected for role

2019-09-26 Thread Tania S Engel (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938924#comment-16938924
 ] 

Tania S Engel commented on CASSANDRA-14328:
---

I also had this issue happen on one of our nodes after it joined the Cassandra 
cluster. I confirmed by using another role that worked and connecting with 
cqlsh and running >use system_auth; select * from roles; 

I could see that in the roles can_login column the role had null (rather than 
true or false). This is the root cause of your exception.

In my case, joining looked as if it worked from our monitoring. nodetool 
netstats did show Mode: Joining but also showed "not sending any streams". 
nodetool netstats later was at Mode: normal.  {color:#172b4d}nodetool status 
showed it was an up node. {color}On closer inspection, the join had not 
streamed. I did have debug.log and it showed the node bootstrapped but it never 
logged "Creating new streaming plan for bootstrap.." so there was no streaming. 
Shortly after the bad node bootstrap started, the debug.log on the good 
existing seed node shows a socket closed and failure to connect. It did 
reconnect, but perhaps that is the reason the streaming plan never commenced. 
It is misleading that the bad node then continued to run and appear as if it 
was bootstrapped. 

I tried to fix the bad node by running >nodetool repair system_auth but that 
did not work. I was able to fix the roles with >nodetool --full system_auth. I 
was able to fix the remaining data by running a full repair on all tables.

> Invalid metadata has been detected for role
> ---
>
> Key: CASSANDRA-14328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Pranav Jindal
>Priority: Normal
>
> Cassandra Version : 3.10
> One node was replaced and was successfully up and working but CQL-SH fails 
> with error.
>  
> CQL-SH error:
>  
> {code:java}
> Connection error: ('Unable to connect to any servers', {'10.180.0.150': 
> AuthenticationFailed('Failed to authenticate to 10.180.0.150: Error from 
> server: code= [Server error] message="java.lang.RuntimeException: Invalid 
> metadata has been detected for role utorjwcnruzzlzafxffgyqmlvkxiqcgb"',)})
> {code}
>  
> Cassandra server ERROR:
> {code:java}
> WARN [Native-Transport-Requests-1] 2018-03-20 13:37:17,894 
> CassandraRoleManager.java:96 - An invalid value has been detected in the 
> roles table for role utorjwcnruzzlzafxffgyqmlvkxiqcgb. If you are unable to 
> login, you may need to disable authentication and confirm that values in that 
> table are accurate
> ERROR [Native-Transport-Requests-1] 2018-03-20 13:37:17,895 Message.java:623 
> - Unexpected exception during request; channel = [id: 0xdfc3604f, 
> L:/10.180.0.150:9042 - R:/10.180.0.150:51668]
> java.lang.RuntimeException: Invalid metadata has been detected for role 
> utorjwcnruzzlzafxffgyqmlvkxiqcgb
> at 
> org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:99)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager$1.apply(CassandraRoleManager.java:82)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:528)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:503)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.auth.CassandraRoleManager.canLogin(CassandraRoleManager.java:310)
>  ~[apache-cassandra-3.10.jar:3.10]
> at org.apache.cassandra.service.ClientState.login(ClientState.java:271) 
> ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:80)
>  ~[apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.10.jar:3.10]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.10.jar:3.10]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:

[jira] [Updated] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached

2018-10-17 Thread Tania S Engel (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-14831:
--
Attachment: (was: image-2018-10-17-13-30-42-590.png)

> Nodetool repair hangs with java.net.SocketException: End-of-stream reached
> --
>
> Key: CASSANDRA-14831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14831
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Tania S Engel
>Priority: Major
> Fix For: 3.11.1
>
> Attachments: Cassandra - 14831 Logs.mht
>
>
> Using Cassandra 3.11.1.
> Ran >nodetool repair  on a small 3 node cluster  from node 
> 3eef. Node 9160 and 3f5e experienced a stream failure. 
> *NODE 9160:* 
> ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 
> 01:45:00,400 StreamSession.java:593 - [Stream 
> #103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session 
> with peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e
> *java.net.SocketException: End-of-stream reached*
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
>  
> *NODE 3f5e:*
> ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 
> 01:45:09,474 StreamSession.java:593 - [Stream 
> #103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session 
> with peer fd70:616e:6761:6561:ec4:7aff:fece:9160
> java.io.IOException: An existing connection was forcibly closed by the remote 
> host
> at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152]
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152]
> at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152]
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) 
> ~[na:1.8.0_152]
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) 
> ~[na:1.8.0_152]
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
> ~[na:1.8.0_152]
> at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
> ~[na:1.8.0_152]
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
>  
> *NODE 3EEF:*
> ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - 
> [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the 
> following error
> org.apache.cassandra.exceptions.RepairException: [repair 
> #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, 
> [(-8271925838625565988,-8266397600493941101], 
> (2290821710735817606,2299380749828706426] 
> …(-8701313305140908434,-8686533141993948378]]] Sync failed between 
> /fd70:616e:6761:6561:ec4:7aff:fece:9160 and 
> /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e
> at 
> org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_152]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_152]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_152]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>  [apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152]
>  
> ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - 
> Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for range 
> [(-827192583862

[jira] [Updated] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached

2018-10-17 Thread Tania S Engel (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-14831:
--
Attachment: Cassandra - 14831 Logs.mht

> Nodetool repair hangs with java.net.SocketException: End-of-stream reached
> --
>
> Key: CASSANDRA-14831
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14831
> Project: Cassandra
>  Issue Type: Bug
>  Components: Repair
>Reporter: Tania S Engel
>Priority: Major
> Fix For: 3.11.1
>
> Attachments: Cassandra - 14831 Logs.mht, 
> image-2018-10-17-13-30-42-590.png
>
>
> Using Cassandra 3.11.1.
> Ran >nodetool repair  on a small 3 node cluster  from node 
> 3eef. Node 9160 and 3f5e experienced a stream failure. 
> *NODE 9160:* 
> ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 
> 01:45:00,400 StreamSession.java:593 - [Stream 
> #103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session 
> with peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e
> *java.net.SocketException: End-of-stream reached*
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
>  
> *NODE 3f5e:*
> ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 
> 01:45:09,474 StreamSession.java:593 - [Stream 
> #103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session 
> with peer fd70:616e:6761:6561:ec4:7aff:fece:9160
> java.io.IOException: An existing connection was forcibly closed by the remote 
> host
> at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152]
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152]
> at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152]
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) 
> ~[na:1.8.0_152]
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) 
> ~[na:1.8.0_152]
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
> ~[na:1.8.0_152]
> at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
> ~[na:1.8.0_152]
> at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
>  
> *NODE 3EEF:*
> ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - 
> [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the 
> following error
> org.apache.cassandra.exceptions.RepairException: [repair 
> #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, 
> [(-8271925838625565988,-8266397600493941101], 
> (2290821710735817606,2299380749828706426] 
> …(-8701313305140908434,-8686533141993948378]]] Sync failed between 
> /fd70:616e:6761:6561:ec4:7aff:fece:9160 and 
> /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e
> at 
> org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162)
>  ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_152]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_152]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_152]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>  [apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152]
>  
> ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - 
> Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for rang

[jira] [Created] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached

2018-10-17 Thread Tania S Engel (JIRA)
Tania S Engel created CASSANDRA-14831:
-

 Summary: Nodetool repair hangs with java.net.SocketException: 
End-of-stream reached
 Key: CASSANDRA-14831
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14831
 Project: Cassandra
  Issue Type: Bug
  Components: Repair
Reporter: Tania S Engel
 Fix For: 3.11.1
 Attachments: image-2018-10-17-13-30-42-590.png

Using Cassandra 3.11.1.

Ran >nodetool repair  on a small 3 node cluster  from node 3eef. 
Node 9160 and 3f5e experienced a stream failure. 

*NODE 9160:* 

ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-16 
01:45:00,400 StreamSession.java:593 - [Stream 
#103fe070-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session with 
peer fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e

*java.net.SocketException: End-of-stream reached*

at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:71)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]

 

*NODE 3f5e:*

ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-16 
01:45:09,474 StreamSession.java:593 - [Stream 
#103ef610-d0e5-11e8-a993-5929a1c131b4] Streaming error occurred on session with 
peer fd70:616e:6761:6561:ec4:7aff:fece:9160

java.io.IOException: An existing connection was forcibly closed by the remote 
host

at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152]

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152]

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152]

at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152]

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_152]

at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) 
~[na:1.8.0_152]

at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
~[na:1.8.0_152]

at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
~[na:1.8.0_152]

at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]

 

*NODE 3EEF:*

ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 - 
[repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the 
following error

org.apache.cassandra.exceptions.RepairException: [repair 
#f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on logs/{color:#33}XX{color}, 
[(-8271925838625565988,-8266397600493941101], 
(2290821710735817606,2299380749828706426] 
…(-8701313305140908434,-8686533141993948378]]] Sync failed between 
/fd70:616e:6761:6561:ec4:7aff:fece:9160 and 
/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e

at 
org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask.java:67) 
~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:202) 
~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:495)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:162)
 ~[apache-cassandra-3.11.1.jar:3.11.1]

at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-3.11.1.jar:3.11.1]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_152]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_152]

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_152]

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_152]

at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
 [apache-cassandra-3.11.1.jar:3.11.1]

at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152]

 

ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 - 
Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for range 
[(-8271925838625565988,-8266397600493941101],…(-6146831664074703724,-6117107236121156255],
 (4842256698807887573,4848113042863615717], 
(-8701313305140908434,-8686533141993948378]] failed with error [repair 
#f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 on 
logs/auditsearchlog,…(-8701313305140908434,-8686533141993948378]]] Sync failed 
between /fd70:616e:6761:6561:ec4:7aff:fece:9160 and 
/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e

org.apache.

[jira] [Commented] (CASSANDRA-10302) Track repair state for more reliable repair

2018-10-17 Thread Tania S Engel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654152#comment-16654152
 ] 

Tania S Engel commented on CASSANDRA-10302:
---

Is there any hope that there will be better repair tracking in 4.0. It really 
would be wonderful to have a nodetool command to see any active running repairs 
(by the ActiveRepairService?). In our small test cluster of 3, nodetool repair 
typically takes 2 minutes, but with a stream failure it can hang, and 13 hours 
later you are left wondering.

> Track repair state for more reliable repair
> ---
>
> Key: CASSANDRA-10302
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10302
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Major
>
> During repair, coordinator and replica exchange various messages. I've seen 
> cases that those messages sometimes get lost.
> We've made repair message to be more durable (CASSANDRA-5393, etc) but still 
> messages seem to be lost and hang repair till messaging timeout reaches.
> We can prevent this by tracking repair status on repair participants, and 
> periodically check state after certain period of times to make sure 
> everything is working fine.
> We alse can add command / JMX API to query repair state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers

2018-08-07 Thread Tania S Engel (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-7066:
-
Attachment: Hide.url

> Simplify (and unify) cleanup of compaction leftovers
> 
>
> Key: CASSANDRA-7066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Benedict
>Assignee: Stefania
>Priority: Minor
>  Labels: benedict-to-commit, compaction
> Fix For: 3.0 alpha 1
>
> Attachments: 7066.txt, Hide.url
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches

2018-06-12 Thread Tania S Engel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509831#comment-16509831
 ] 

Tania S Engel commented on CASSANDRA-10876:
---

Given we use Murmur3 I have learned the token hash will be the same for the 
example so the Coordinator will send the inserts to the same node and not be 
overloaded. Therefore, it seems the warning is too broad and in our case can be 
ignored. 

> Alter behavior of batch WARN and fail on single partition batches
> -
>
> Key: CASSANDRA-10876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10876
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: lhf
> Fix For: 3.6
>
> Attachments: 10876.txt
>
>
> In an attempt to give operator insight into potentially harmful batch usage, 
> Jiras were created to log WARN or fail on certain batch sizes. This ignores 
> the single partition batch, which doesn't create the same issues as a 
> multi-partition batch. 
> The proposal is to ignore size on single partition batch statements. 
> Reference:
> [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487]
> [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches

2018-06-05 Thread Tania S Engel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501951#comment-16501951
 ] 

Tania S Engel commented on CASSANDRA-10876:
---

Cassandra data models are based on queries so tables can often be the same 
partition key with different frequently queried data points making up the 
clustering keys. In that case, the data being the same, it's also quite common 
to want to atomically batch insert the data. In this example, which I also 
posted on stack overflow,

[https://stackoverflow.com/questions/50652243/can-a-cassandra-partition-key-span-multiple-tables-in-one-keyspace]

would the coordinator farm these inserts out to different nodes given a RF < 
nodes? Or would the partition key, albeit in different tables, hash to the same 
value? I ask because of all the recommendations not to use multiple partition 
batches. And, in our design we are still seeing these batch_size_warn_threshold 
warnings in 3.11.1. 

 

use logskeyspace;

CREATE TABLE Log_User(LogDay timestamp, UserId int, EventId int) PRIMARY KEY 
(Day, UserId)

CREATE TABLE Log_Event(LogDay timestamp, EventId int,  UserId int) PRIMARY KEY 
(Day, EventId)

BEGIN BATCH

INSERT INTO Log_User(LogDay timestamp,  UserId int, EventId int) 
VALUES("2018-03-21 00:00Z", 10, 23);

INSERT INTO Log_Event(LogDay timestamp, EventId int,  UserId int) 
VALUES("2018-03-21 00:00Z", 23, 10);

APPLY BATCH;

> Alter behavior of batch WARN and fail on single partition batches
> -
>
> Key: CASSANDRA-10876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10876
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: lhf
> Fix For: 3.6
>
> Attachments: 10876.txt
>
>
> In an attempt to give operator insight into potentially harmful batch usage, 
> Jiras were created to log WARN or fail on certain batch sizes. This ignores 
> the single partition batch, which doesn't create the same issues as a 
> multi-partition batch. 
> The proposal is to ignore size on single partition batch statements. 
> Reference:
> [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487]
> [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches

2018-06-01 Thread Tania S Engel (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-10876:
--
Comment: was deleted

(was: In 3.11.1 we still see these warnings even when we are inserting into 
multiple tables with the same partition key. The comment above from Patrick 
seems to indicate a partition key is for one keyspace.table. But I thought a 
partition key was a value that hashed to the same value. Is it still a burden 
on the coordinator worthy of a warning if we do a batch insert to 4 tables with 
the same partition key ({color:#33}Day{color})? For example if we do a 
batch insert into these 2 tables, is that considered a single partition insert 
and if so why is there a warning?

CREATE TABLE Log_User (Day timestamp, LogTime timeuuid, 
{color:#33}UserID{color} int)  PRIMARY KEY ({color:#33}Day {color}, 
UserID, LogTime)

CREATE TABLE Log_Event( Day timestamp, LogTime timeuuid, EventID  int) 
{color:#33}PRIMARY KEY ({color}{color:#33}Day {color}{color:#33}, 
EventID , LogTime){color})

> Alter behavior of batch WARN and fail on single partition batches
> -
>
> Key: CASSANDRA-10876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10876
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: lhf
> Fix For: 3.6
>
> Attachments: 10876.txt
>
>
> In an attempt to give operator insight into potentially harmful batch usage, 
> Jiras were created to log WARN or fail on certain batch sizes. This ignores 
> the single partition batch, which doesn't create the same issues as a 
> multi-partition batch. 
> The proposal is to ignore size on single partition batch statements. 
> Reference:
> [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487]
> [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10876) Alter behavior of batch WARN and fail on single partition batches

2018-06-01 Thread Tania S Engel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498508#comment-16498508
 ] 

Tania S Engel commented on CASSANDRA-10876:
---

In 3.11.1 we still see these warnings even when we are inserting into multiple 
tables with the same partition key. The comment above from Patrick seems to 
indicate a partition key is for one keyspace.table. But I thought a partition 
key was a value that hashed to the same value. Is it still a burden on the 
coordinator worthy of a warning if we do a batch insert to 4 tables with the 
same partition key ({color:#33}Day{color})? For example if we do a batch 
insert into these 2 tables, is that considered a single partition insert and if 
so why is there a warning?

CREATE TABLE Log_User (Day timestamp, LogTime timeuuid, 
{color:#33}UserID{color} int)  PRIMARY KEY ({color:#33}Day {color}, 
UserID, LogTime)

CREATE TABLE Log_Event( Day timestamp, LogTime timeuuid, EventID  int) 
{color:#33}PRIMARY KEY ({color}{color:#33}Day {color}{color:#33}, 
EventID , LogTime){color}

> Alter behavior of batch WARN and fail on single partition batches
> -
>
> Key: CASSANDRA-10876
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10876
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Patrick McFadin
>Assignee: Sylvain Lebresne
>Priority: Minor
>  Labels: lhf
> Fix For: 3.6
>
> Attachments: 10876.txt
>
>
> In an attempt to give operator insight into potentially harmful batch usage, 
> Jiras were created to log WARN or fail on certain batch sizes. This ignores 
> the single partition batch, which doesn't create the same issues as a 
> multi-partition batch. 
> The proposal is to ignore size on single partition batch statements. 
> Reference:
> [CASSANDRA-6487|https://issues.apache.org/jira/browse/CASSANDRA-6487]
> [CASSANDRA-8011|https://issues.apache.org/jira/browse/CASSANDRA-8011]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13480) nodetool repair can hang forever if we lose the notification for the repair completing/failing

2018-02-27 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378919#comment-16378919
 ] 

Tania S Engel commented on CASSANDRA-13480:
---

[~mbyrd] : I have reason to believe I just hit this in 3.11.1, I at the very 
least ran into a repair which has never completed on an 11 node cluster. Is 
there a way to get this fix in 3.11?

> nodetool repair can hang forever if we lose the notification for the repair 
> completing/failing
> --
>
> Key: CASSANDRA-13480
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13480
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Minor
>  Labels: repair
> Fix For: 4.0
>
>
> When a Jmx lost notification occurs, sometimes the lost notification in 
> question is the notification which let's RepairRunner know that the repair is 
> finished (ProgressEventType.COMPLETE or even ERROR for that matter).
> This results in nodetool process running the repair hanging forever. 
> I have a test which reproduces the issue here:
> https://github.com/Jollyplum/cassandra-dtest/tree/repair_hang_test
> To fix this, If on receiving a notification that notifications have been lost 
> (JMXConnectionNotification.NOTIFS_LOST), we instead query a new endpoint via 
> Jmx to receive all the relevant notifications we're interested in, we can 
> replay those we missed and avoid this scenario.
> It's possible also that the JMXConnectionNotification.NOTIFS_LOST itself 
> might be lost and so for good measure I have made RepairRunner poll 
> periodically to see if there were any notifications that had been sent but we 
> didn't receive (scoped just to the particular tag for the given repair).
> Users who don't use nodetool but go via jmx directly, can still use this new 
> endpoint and implement similar behaviour in their clients as desired.
> I'm also expiring the notifications which have been kept on the server side.
> Please let me know if you've any questions or can think of a different 
> approach, I also tried setting:
>  JVM_OPTS="$JVM_OPTS -Djmx.remote.x.notification.buffer.size=5000"
> but this didn't fix the test. I suppose it might help under certain scenarios 
> but in this test we don't even send that many notifications so I'm not 
> surprised it doesn't fix it.
> It seems like getting lost notifications is always a potential problem with 
> jmx as far as I can tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14006) Migration task failed completes bootstrap but it didn't stream any data.

2017-11-09 Thread Tania S Engel (JIRA)
Tania S Engel created CASSANDRA-14006:
-

 Summary: Migration task failed completes bootstrap but it didn't 
stream any data.
 Key: CASSANDRA-14006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14006
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Tania S Engel


When joining just one node with very little data we often get "Migration task 
failed to complete" per 
https://github.com/apache/cassandra/commit/ae315b5ec944571342146867c51b2ceb50f3845e
 
We increased the timeout on *MIGRATION_TASK_WAIT_IN_SECONDS *to 15 minutes 
thinking there was some sort of auto retry mechanism in the underlying 
messaging. However all that does is increase the time to failure. When these 
migration tasks fail, the bootstrap is marked complete but it clearly wasn't 
complete because usage of the data results in a 
cassandra.db.*UnknownColumnFamilyException*. Also, it is evident in the logs 
that no data was streamed from the seed node to the newly bootstrapping node. 
We have had numerous tests showing that if a migration task times out, the node 
exits joining mode, the bootstrap logs complete, but it hasn't streamed any 
data and the only course of action seems to be a Cassandra restart. Our 
replication factor is set such that the bootstrapping node needs to get all the 
data. If we were to leave the Cassandra node running, would it eventually send 
another migration task and stream the necessary data?
 
On closer inspection of the code, it seems that the *MigrationTask.java* 
runMayThrow sends the migration request message using *sendRR*, which is a fire 
and forget. So, if the callback is not hit, it can leave you in a state where 
the CountdownLatch.countDown() is never invoked. So, I suppose that is the 
point of the timeout when waiting for the latch. But wouldn't it be better to 
resend the migration task? I certainly haven't learned all the messaging 
service but it seems that dropping a packet here and there could cause 
bootstrap to succeed in this misleading way. Would it make sense for the 
MigrationTask runMayThrow to create a IAsyncCallbackWithFailure for the 
callback and implement the OnFailure to also CountdownLatch.countDown() and 
generate another migration task? Or perhaps allow users of Cassandra to 
configure something like a MIGRATION_TASK_RETRY_ATTEMPTS?
 
When the MigrationTask does fail to complete, we see the log 3 times. Is this 
the resend of the same migration task …which is just schema version exchange? 
In which case if all 3 failed it means all attempts failed to reach the seed 
endpoint or the response failed to reach the bootstrapping endpoint. Are we 
correct in assuming this is a network error and there are no scenarios where 
the seed node would ignore the migration task from the bootstrapping node? 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11218) Prioritize Secondary Index rebuild

2017-08-02 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111658#comment-16111658
 ] 

Tania S Engel commented on CASSANDRA-11218:
---

Could the index summary redistribution be the cause of the 29 minute gap in 
stream time for this appliance? Typically in this test for us the Prepare 
completed happens within the minute.

INFO  [STREAM-INIT-/fd70:616e:6761:6561:ae1f:6bff:fe12:3de8:24642] 2017-07-28 
*20:05*:28,929 StreamResultFuture.java:123 - [Stream 
#1845aa20-73d0-11e7-8027-4139c6f86357, ID#0] Received streaming plan for 
Bootstrap
INFO  [IndexSummaryManager:1] 2017-07-28 20:13:06,822 
IndexSummaryRedistribution.java:75 - Redistributing index summaries
INFO  [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3de8:24642] 2017-07-28 
*20:34*:10,712 StreamResultFuture.java:173 - [Stream 
#1845aa20-73d0-11e7-8027-4139c6f86357 ID#0] Prepare completed. Receiving 0 
files(0.000KiB), sending 57 files(369.250KiB)


> Prioritize Secondary Index rebuild
> --
>
> Key: CASSANDRA-11218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11218
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Secondary Indexes
>Reporter: sankalp kohli
>Assignee: Jeff Jirsa
>Priority: Minor
>
> We have seen that secondary index rebuild get stuck behind other compaction 
> during a bootstrap and other operations. This causes things to not finish. We 
> should prioritize index rebuild via a separate thread pool or using a 
> priority queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-31 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13441:
--
Comment: was deleted

(was: We also see this running Cassandra 3.10, not upgrading rather when a new 
node is joining. The join seems to complete but there are no stream session 
logs and we end up failing with UnknownColumnFamily. 

!screenshot-1.png!
)

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-31 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13441:
--
Attachment: (was: screenshot-1.png)

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-31 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13441:
--
Comment: was deleted

(was: It is unclear to me if the migration task timing out caused none of our 
tables to stream. I am 100% certain we have keyspaces on the seed node but none 
of it was streamed. From reading the various related/duplicate cases, it is 
unclear if the suggestion is to increase 
-Dcassandra.migration_task_wait_in_seconds or upgrade to 3.11 or both. This 
problem is intermittent and it happens even when we are bootstrapping the first 
node to form a cluster of 2. Is the idea that if we "wait for schema 
agreement...", we will eventually get the tables? It seems that is not the case 
for this scenario. Right now our code will wait for >nodetool netstats 
MODE=NORMAL, but in this case, the CQL port is already open and we have already 
seen "boostrap complete" so I am 99% sure the MODE=NORMAL (no longer JOINING).

Also, I have seen a bootstrap scenario where just one "ERROR Migration task 
failed to complete" logged and then the bootstrap succeeded and streamed all 
our tables. )

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13608) Connection closed/reopened during join with MVs causes Cassandra stream to close

2017-07-25 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100268#comment-16100268
 ] 

Tania S Engel commented on CASSANDRA-13608:
---

Thanks for looking into this. I have updated the description and title to 
clarify that we do use MVs.

> Connection closed/reopened during join with MVs causes Cassandra stream to 
> close
> 
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
>Assignee: Kurt Greaves
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf, Cassandra 3.10 Join with lots GC 
> collection leads to socket closure and join hang.txt
>
>
> We use MVs. We start a JOIN bootstrap. Primary seed node streams to the 
> replica. The replica requires some GC cleanup and experiences frequent pauses 
> including a 12 second old gen cleanup following a memTable flush. Both 
> replica and primary show _MessagingService IOException: An existing 
> connection was forcibly closed by the remote host_. The replica 
> MessagingService-Outgoing reestablishes the connection immediately but the 
> primary StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing 
> stream handler has been closed_. From that point forward, the replica stays 
> in JOIN mode, sending keeping alive to the primary. The primary receives the 
> keep alive, but does not send its own and it repeatedly fails to send a hints 
> file to the replica. It seems this limping condition would continue 
> indefinitely, but stops as we stop the replica Cassandra. If we restart the 
> replica Cassandra the JOIN picks up again but fails with 
> _java.io.IOException: Corrupt value length 355151036 encountered, as it 
> exceeds the maximum of 268435456, which is set via max_value_size_in_mb in 
> cassandra.yaml_. We have not increased this value as we do not have values 
> that large in our data so we presume it is indeed corrupt and moving past it 
> would not be a good idea. Please see the attachment for details.
> {code}
> ERROR [BatchlogTasks:1] 2017-06-13 20:24:13,953 CassandraDaemon.java:229 - 
> Exception in thread Thread[BatchlogTasks:1,5,main]
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> C:\Cassandra\data\data\system\batches-919a4bc57a333573b03e13fc3f68b465\mc-2-big-Data.db
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:395)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:257)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:69)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:67)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:755)
>  ~[apache-cassandra-3.10.jar

[jira] [Updated] (CASSANDRA-13608) Connection closed/reopened during join with MVs causes Cassandra stream to close

2017-07-25 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--
Summary: Connection closed/reopened during join with MVs causes Cassandra 
stream to close  (was: Connection closed/reopened during join causes Cassandra 
stream to close)

> Connection closed/reopened during join with MVs causes Cassandra stream to 
> close
> 
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
>Assignee: Kurt Greaves
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf, Cassandra 3.10 Join with lots GC 
> collection leads to socket closure and join hang.txt
>
>
> We use MVs. We start a JOIN bootstrap. Primary seed node streams to the 
> replica. The replica requires some GC cleanup and experiences frequent pauses 
> including a 12 second old gen cleanup following a memTable flush. Both 
> replica and primary show _MessagingService IOException: An existing 
> connection was forcibly closed by the remote host_. The replica 
> MessagingService-Outgoing reestablishes the connection immediately but the 
> primary StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing 
> stream handler has been closed_. From that point forward, the replica stays 
> in JOIN mode, sending keeping alive to the primary. The primary receives the 
> keep alive, but does not send its own and it repeatedly fails to send a hints 
> file to the replica. It seems this limping condition would continue 
> indefinitely, but stops as we stop the replica Cassandra. If we restart the 
> replica Cassandra the JOIN picks up again but fails with 
> _java.io.IOException: Corrupt value length 355151036 encountered, as it 
> exceeds the maximum of 268435456, which is set via max_value_size_in_mb in 
> cassandra.yaml_. We have not increased this value as we do not have values 
> that large in our data so we presume it is indeed corrupt and moving past it 
> would not be a good idea. Please see the attachment for details.
> {code}
> ERROR [BatchlogTasks:1] 2017-06-13 20:24:13,953 CassandraDaemon.java:229 - 
> Exception in thread Thread[BatchlogTasks:1,5,main]
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> C:\Cassandra\data\data\system\batches-919a4bc57a333573b03e13fc3f68b465\mc-2-big-Data.db
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:395)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:257)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) 
> ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:69)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:67)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.10.jar:3.10]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:755)
>  ~[apache-cas

[jira] [Updated] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-07-25 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--
Description: 
We use MVs. We start a JOIN bootstrap. Primary seed node streams to the 
replica. The replica requires some GC cleanup and experiences frequent pauses 
including a 12 second old gen cleanup following a memTable flush. Both replica 
and primary show _MessagingService IOException: An existing connection was 
forcibly closed by the remote host_. The replica MessagingService-Outgoing 
reestablishes the connection immediately but the primary 
StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
handler has been closed_. From that point forward, the replica stays in JOIN 
mode, sending keeping alive to the primary. The primary receives the keep 
alive, but does not send its own and it repeatedly fails to send a hints file 
to the replica. It seems this limping condition would continue indefinitely, 
but stops as we stop the replica Cassandra. If we restart the replica Cassandra 
the JOIN picks up again but fails with _java.io.IOException: Corrupt value 
length 355151036 encountered, as it exceeds the maximum of 268435456, which is 
set via max_value_size_in_mb in cassandra.yaml_. We have not increased this 
value as we do not have values that large in our data so we presume it is 
indeed corrupt and moving past it would not be a good idea. Please see the 
attachment for details.

{code}
ERROR [BatchlogTasks:1] 2017-06-13 20:24:13,953 CassandraDaemon.java:229 - 
Exception in thread Thread[BatchlogTasks:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
C:\Cassandra\data\data\system\batches-919a4bc57a333573b03e13fc3f68b465\mc-2-big-Data.db
at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:395)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:257)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) 
~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:69)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:67)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:755)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:446)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.cql3.UntypedResultSet$FromPager$1.computeNext(UntypedResultSet.java:193)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.cql3.UntypedResultSet$FromPager$1.computeNext(UntypedResultSet.java:179)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:233)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:209)
 ~[apache-cassandra-3.10.jar:3.10]
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
 ~[apache-cassandra-3.10.jar:3.10]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
a

[jira] [Updated] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-07-24 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--
Fix Version/s: (was: 3.10)

> Connection closed/reopened during join causes Cassandra stream to close
> ---
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf, Cassandra 3.10 Join with lots GC 
> collection leads to socket closure and join hang.txt
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The 
> replica requires some GC cleanup and experiences frequent pauses including a 
> 12 second old gen cleanup following a memTable flush. Both replica and 
> primary show _MessagingService IOException: An existing connection was 
> forcibly closed by the remote host_. The replica MessagingService-Outgoing 
> reestablishes the connection immediately but the primary 
> StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
> handler has been closed_. >From that point forward, the replica stays in JOIN 
> mode, sending keeping alive to the primary. The primary receives the keep 
> alive, but does not send its own and it repeatedly fails to send a hints file 
> to the replica. It seems this limping condition would continue indefinitely, 
> but stops as we stop the replica Cassandra. If we restart the replica 
> Cassandra the JOIN picks up again but fails with _java.io.IOException: 
> Corrupt value length 355151036 encountered, as it exceeds the maximum of 
> 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have 
> not increased this value as we do not have values that large in our data so 
> we presume it is indeed corrupt and moving past it would not be a good idea. 
> Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-24 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099052#comment-16099052
 ] 

Tania S Engel commented on CASSANDRA-13441:
---

It is unclear to me if the migration task timing out caused none of our tables 
to stream. I am 100% certain we have keyspaces on the seed node but none of it 
was streamed. From reading the various related/duplicate cases, it is unclear 
if the suggestion is to increase -Dcassandra.migration_task_wait_in_seconds or 
upgrade to 3.11 or both. This problem is intermittent and it happens even when 
we are bootstrapping the first node to form a cluster of 2. Is the idea that if 
we "wait for schema agreement...", we will eventually get the tables? It seems 
that is not the case for this scenario. Right now our code will wait for 
>nodetool netstats MODE=NORMAL, but in this case, the CQL port is already open 
and we have already seen "boostrap complete" so I am 99% sure the MODE=NORMAL 
(no longer JOINING).

Also, I have seen a bootstrap scenario where just one "ERROR Migration task 
failed to complete" logged and then the bootstrap succeeded and streamed all 
our tables. 

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: screenshot-1.png
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-23 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13441:
--
Attachment: screenshot-1.png

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: screenshot-1.png
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-07-23 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097820#comment-16097820
 ] 

Tania S Engel commented on CASSANDRA-13441:
---

We also see this running Cassandra 3.10, not upgrading rather when a new node 
is joining. The join seems to complete but there are no stream session logs and 
we end up failing with UnknownColumnFamily. 

!screenshot-1.png!


> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: screenshot-1.png
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-27 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064975#comment-16064975
 ] 

Tania S Engel commented on CASSANDRA-13565:
---

Fabulous. Thanks for the explanation, I understand enough to fix it. We 
actually switched away from using MVs from these particular tables due to their 
heavy inserts, and the fact we kept running into memory issues when joining.

How about my other question ...setting  commitlog_segment_size_in_mb=2048 we 
get :
ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
log during initialization.
org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was made 
to move the file pointer before the beginning of the file


> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-26 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064032#comment-16064032
 ] 

Tania S Engel commented on CASSANDRA-13565:
---

To make an unwieldy "wide" partition, is the problem that we have too many 
columns with potentially large varchar data? I am not sure how changing the key 
to LogDay or LogHour would change the "width" of the partition. 

> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-26 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063647#comment-16063647
 ] 

Tania S Engel commented on CASSANDRA-13565:
---

All of our partition keys have small values (not varchar). Some are composite, 
with 1 or 2 partition keys and four or 5 clustering keys. Would the composite 
of partition and clustering keys (albeit not big individual values) contribute 
to a "wide" partition? Here is an example of the CQL:
!CQLforTable.png!

> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-26 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13565:
--
Attachment: CQLforTable.png

> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13612) Hints file 608MB even though max_hints_file_size_in_mb=128

2017-06-19 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054654#comment-16054654
 ] 

Tania S Engel commented on CASSANDRA-13612:
---

Also, we left this ~600MB hints file and the node continuously tried to send it 
to the one other clustered node (the seed node). It never did pass after 3 
days. So, we finally deleted it and the hinted handoff was able to handoff the 
many other hints files which were stuck "behind" this large one.  

> Hints file 608MB even though max_hints_file_size_in_mb=128
> --
>
> Key: CASSANDRA-13612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13612
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
> Environment: Cassandra 3.10 following a JOIN on replica node 
>Reporter: Tania S Engel
>Priority: Trivial
> Attachments: Cassandra 3.10 bug hint log size.mht
>
>
> C:\Cassandra\data\hints has a file of size 608MB but my Cassandra.yaml has a 
> max_hints_file_size_in_mb=128. I have confirmed in the debug logs that the 
> setting is picked up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-06-19 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--
Attachment: Cassandra 3.10 Join with lots GC collection leads to socket 
closure and join hang.txt

> Connection closed/reopened during join causes Cassandra stream to close
> ---
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
> Fix For: 3.10
>
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf, Cassandra 3.10 Join with lots GC 
> collection leads to socket closure and join hang.txt
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The 
> replica requires some GC cleanup and experiences frequent pauses including a 
> 12 second old gen cleanup following a memTable flush. Both replica and 
> primary show _MessagingService IOException: An existing connection was 
> forcibly closed by the remote host_. The replica MessagingService-Outgoing 
> reestablishes the connection immediately but the primary 
> StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
> handler has been closed_. >From that point forward, the replica stays in JOIN 
> mode, sending keeping alive to the primary. The primary receives the keep 
> alive, but does not send its own and it repeatedly fails to send a hints file 
> to the replica. It seems this limping condition would continue indefinitely, 
> but stops as we stop the replica Cassandra. If we restart the replica 
> Cassandra the JOIN picks up again but fails with _java.io.IOException: 
> Corrupt value length 355151036 encountered, as it exceeds the maximum of 
> 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have 
> not increased this value as we do not have values that large in our data so 
> we presume it is indeed corrupt and moving past it would not be a good idea. 
> Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-06-19 Thread Tania S Engel (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tania S Engel updated CASSANDRA-13608:
--
Attachment: Cassandra 3.10 Join with lots GC collection leads to socket 
closure and join hang.pdf

*.mht can be opened with a web browser. Attached is *pdf.

> Connection closed/reopened during join causes Cassandra stream to close
> ---
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
> Fix For: 3.10
>
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht, Cassandra 3.10 Join with lots GC collection 
> leads to socket closure and join hang.pdf
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The 
> replica requires some GC cleanup and experiences frequent pauses including a 
> 12 second old gen cleanup following a memTable flush. Both replica and 
> primary show _MessagingService IOException: An existing connection was 
> forcibly closed by the remote host_. The replica MessagingService-Outgoing 
> reestablishes the connection immediately but the primary 
> StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
> handler has been closed_. >From that point forward, the replica stays in JOIN 
> mode, sending keeping alive to the primary. The primary receives the keep 
> alive, but does not send its own and it repeatedly fails to send a hints file 
> to the replica. It seems this limping condition would continue indefinitely, 
> but stops as we stop the replica Cassandra. If we restart the replica 
> Cassandra the JOIN picks up again but fails with _java.io.IOException: 
> Corrupt value length 355151036 encountered, as it exceeds the maximum of 
> 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have 
> not increased this value as we do not have values that large in our data so 
> we presume it is indeed corrupt and moving past it would not be a good idea. 
> Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-06-19 Thread Tania S Engel (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054186#comment-16054186
 ] 

Tania S Engel commented on CASSANDRA-13608:
---

Paulo, Did you see the attachment? The source is the "primary seed node". The 
destination is the "replica". Will that not suffice?

> Connection closed/reopened during join causes Cassandra stream to close
> ---
>
> Key: CASSANDRA-13608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
> disk, RAID10 with 4 spindles, 8 Cores
>Reporter: Tania S Engel
> Fix For: 3.10
>
> Attachments: Cassandra 3.10 Join with lots GC collection leads to 
> socket closure and join hang.mht
>
>
> We start a JOIN bootstrap. Primary seed node streams to the replica. The 
> replica requires some GC cleanup and experiences frequent pauses including a 
> 12 second old gen cleanup following a memTable flush. Both replica and 
> primary show _MessagingService IOException: An existing connection was 
> forcibly closed by the remote host_. The replica MessagingService-Outgoing 
> reestablishes the connection immediately but the primary 
> StreamKeepAliveExecutor throws a _java.RuntimeException: Outgoing stream 
> handler has been closed_. >From that point forward, the replica stays in JOIN 
> mode, sending keeping alive to the primary. The primary receives the keep 
> alive, but does not send its own and it repeatedly fails to send a hints file 
> to the replica. It seems this limping condition would continue indefinitely, 
> but stops as we stop the replica Cassandra. If we restart the replica 
> Cassandra the JOIN picks up again but fails with _java.io.IOException: 
> Corrupt value length 355151036 encountered, as it exceeds the maximum of 
> 268435456, which is set via max_value_size_in_mb in cassandra.yaml_. We have 
> not increased this value as we do not have values that large in our data so 
> we presume it is indeed corrupt and moving past it would not be a good idea. 
> Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13612) Hints file 608MB even though max_hints_file_size_in_mb=128

2017-06-15 Thread Tania S Engel (JIRA)
Tania S Engel created CASSANDRA-13612:
-

 Summary: Hints file 608MB even though max_hints_file_size_in_mb=128
 Key: CASSANDRA-13612
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13612
 Project: Cassandra
  Issue Type: Bug
  Components: Configuration
 Environment: Cassandra 3.10 following a JOIN on replica node 
Reporter: Tania S Engel
Priority: Trivial
 Attachments: Cassandra 3.10 bug hint log size.mht

C:\Cassandra\data\hints has a file of size 608MB but my Cassandra.yaml has a 
max_hints_file_size_in_mb=128. I have confirmed in the debug logs that the 
setting is picked up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13608) Connection closed/reopened during join causes Cassandra stream to close

2017-06-15 Thread Tania S Engel (JIRA)
Tania S Engel created CASSANDRA-13608:
-

 Summary: Connection closed/reopened during join causes Cassandra 
stream to close
 Key: CASSANDRA-13608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13608
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
 Environment: Cassandra 3.10. Windows Server 2016, 32GB ram, 2TB hard 
disk, RAID10 with 4 spindles, 8 Cores
Reporter: Tania S Engel
 Fix For: 3.10
 Attachments: Cassandra 3.10 Join with lots GC collection leads to 
socket closure and join hang.mht

We start a JOIN bootstrap. Primary seed node streams to the replica. The 
replica requires some GC cleanup and experiences frequent pauses including a 12 
second old gen cleanup following a memTable flush. Both replica and primary 
show _MessagingService IOException: An existing connection was forcibly closed 
by the remote host_. The replica MessagingService-Outgoing reestablishes the 
connection immediately but the primary StreamKeepAliveExecutor throws a 
_java.RuntimeException: Outgoing stream handler has been closed_. From that 
point forward, the replica stays in JOIN mode, sending keeping alive to the 
primary. The primary receives the keep alive, but does not send its own and it 
repeatedly fails to send a hints file to the replica. It seems this limping 
condition would continue indefinitely, but stops as we stop the replica 
Cassandra. If we restart the replica Cassandra the JOIN picks up again but 
fails with _java.io.IOException: Corrupt value length 355151036 encountered, as 
it exceeds the maximum of 268435456, which is set via max_value_size_in_mb in 
cassandra.yaml_. We have not increased this value as we do not have values that 
large in our data so we presume it is indeed corrupt and moving past it would 
not be a good idea. Please see the attachment for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-05-31 Thread Tania S Engel (JIRA)
Tania S Engel created CASSANDRA-13565:
-

 Summary: Materialized view usage of commit logs requires large 
mutation but commitlog_segment_size_in_mb=2048 causes exception
 Key: CASSANDRA-13565
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
 Project: Cassandra
  Issue Type: Bug
  Components: Configuration, Materialized Views, Streaming and Messaging
 Environment: Cassandra 3.9.0, Windows 
Reporter: Tania S Engel


We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
scenario (not applyunsafe during JOIN) which leads to :
java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
for the maximum size of 512.000MiB
    at 
org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
~[apache-cassandra-3.9.0.jar:3.9.0]
    at 
org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
 [apache-cassandra-3.9.0.jar:3.9.0]
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_66]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_66]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_66]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 

Due to the relationship of max_mutation_size_in_kb and 
commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
left Cassandra to calculate max_mutation_size_in_kb as half the size 
commitlog_segment_size_in_mb * 1024.

 However, we have found that if we set commitlog_segment_size_in_mb=2048 we get 
an exception upon starting Cassandra, when it is creating a new commit log.

ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
log during initialization.
org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was made 
to move the file pointer before the beginning of the file

Perhaps the index you are using is not big enough and it goes negative.

Is the relationship between max_mutation_size_in_kb and 
commitlog_segment_size_in_mb important to preserve? In our limited stress test 
we are finding mutation size already over 512mb and we expect more data in our 
sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org