Re: Open File Handles for Deleted sstables

2016-09-28 Thread Anuj Wadehra
Restarting may be a temporary workaround but cant be a permanent solution. 
After some days, the problem will come back again.
ThanksAnuj



Sent from Yahoo Mail on Android 
 
  On Thu, 29 Sep, 2016 at 12:54 AM, sai krishnam raju 
potturi wrote:   restarting the cassandra service helped 
get rid of those files in our situation.
thanksSai
On Wed, Sep 28, 2016 at 3:15 PM, Anuj Wadehra  wrote:

Hi,
We are facing an issue where Cassandra has open file handles for deleted 
sstable files. These open file handles keep on increasing with time and 
eventually lead to disk crisis. This is visible via lsof command. 
There are no Exceptions in logs.We are suspecting a race condition where 
compactions/repairs and reads are done on same sstable. I have gone through few 
JIRAs but somehow not able to coorelate the issue with those tickets. 
We are using 2.0.14. OS is Red Hat Linux.
Any suggestions?

ThanksAnuj




  


Re: New node block in autobootstrap

2016-09-28 Thread Alain RODRIGUEZ
>
> Forgot to set replication for new data center :(


I was feeling like it could be it :-). From the other thread:


> It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?
>
>- Are all the nodes using the same version ('nodetool version')?
>- What does 'nodetool status keyspace_name1' output?
>- Are you sure to be using Network Topology Strategy on '
>*keyspace_name1'? *Have you modified this schema to add replications
>on DC3
>
> My guess is something could be wrong with the configuration.
>


I was starting to wonder about this one though, so thanks for letting us
about it :-).

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-28 23:54 GMT+02:00 techpyaasa . :

> Forgot to set replication for new data center :(
>
> On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad 
> wrote:
>
>> What was the reason?
>>
>> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . 
>> wrote:
>>
>>> Very sorry...I got the reason for this issue..
>>> Please ignore.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . 
>>> wrote:
>>>
 @Paulo

 We have done changes as you said
 net.ipv4.tcp_keepalive_time=60
 net.ipv4.tcp_keepalive_probes=3
 net.ipv4.tcp_keepalive_intvl=10

 and increased streaming_socket_timeout_in_ms to 48 hours ,
 "phi_convict_threshold : 9".

 And once again recommissioned new data center (DC3)  , ran " nodetool
 rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
 got exit without any exception.

 Please check logs below

 *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
 StorageService.java (line 914) rebuild from dc: IDC*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
 StreamResultFuture.java (line 87) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for 
 Rebuild*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.75*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.132*
 * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.75*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.133*
 * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.132*
 * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.133*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.167*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.78*
 * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.167*
 * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.78*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.126*
 * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
 StreamResultFuture.java (line 91) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
 /xxx.xxx.198.191*
 * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
 StreamSession.java (line 214) [Stream
 #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
 /xxx.xxx.198.126*
 * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526

Re: New node block in autobootstrap

2016-09-28 Thread techpyaasa .
Forgot to set replication for new data center :(

On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad  wrote:

> What was the reason?
>
> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa .  wrote:
>
>> Very sorry...I got the reason for this issue..
>> Please ignore.
>>
>>
>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . 
>> wrote:
>>
>>> @Paulo
>>>
>>> We have done changes as you said
>>> net.ipv4.tcp_keepalive_time=60
>>> net.ipv4.tcp_keepalive_probes=3
>>> net.ipv4.tcp_keepalive_intvl=10
>>>
>>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>>> "phi_convict_threshold : 9".
>>>
>>> And once again recommissioned new data center (DC3)  , ran " nodetool
>>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>>> got exit without any exception.
>>>
>>> Please check logs below
>>>
>>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>>> StorageService.java (line 914) rebuild from dc: IDC*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>>> StreamResultFuture.java (line 87) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.75*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.132*
>>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.75*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.133*
>>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.132*
>>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.133*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.167*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.78*
>>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.167*
>>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.78*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.126*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.191*
>>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.126*
>>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.191*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.168*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.169*
>>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.168*
>>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.169*
>>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>>> 

Re: Open File Handles for Deleted sstables

2016-09-28 Thread Jeff Jirsa
There have been a history of leaks where repairs are multiple repairs were run 
on the same node at the same time ( e.g: 
https://issues.apache.org/jira/browse/CASSANDRA-11215 ) 

 

You’re running a very old version of Cassandra. If you’re able to upgrade to 
newest 2.1 or 2.2, it’s likely that at least SOME similar bugs are addressed.  

 

 

 

From: Anuj Wadehra 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, September 28, 2016 at 12:15 PM
To: User 
Subject: Open File Handles for Deleted sstables

 

Hi, 

 

We are facing an issue where Cassandra has open file handles for deleted 
sstable files. These open file handles keep on increasing with time and 
eventually lead to disk crisis. This is visible via lsof command. 

 

There are no Exceptions in logs.We are suspecting a race condition where 
compactions/repairs and reads are done on same sstable. I have gone through few 
JIRAs but somehow not able to coorelate the issue with those tickets. 

 

We are using 2.0.14. OS is Red Hat Linux.

 

Any suggestions?

 

Thanks

Anuj

 


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


Re: Open File Handles for Deleted sstables

2016-09-28 Thread sai krishnam raju potturi
restarting the cassandra service helped get rid of those files in our
situation.

thanks
Sai

On Wed, Sep 28, 2016 at 3:15 PM, Anuj Wadehra 
wrote:

> Hi,
>
> We are facing an issue where Cassandra has open file handles for deleted
> sstable files. These open file handles keep on increasing with time and
> eventually lead to disk crisis. This is visible via lsof command.
>
> There are no Exceptions in logs.We are suspecting a race condition where
> compactions/repairs and reads are done on same sstable. I have gone through
> few JIRAs but somehow not able to coorelate the issue with those tickets.
>
> We are using 2.0.14. OS is Red Hat Linux.
>
> Any suggestions?
>
> Thanks
> Anuj
>
>
>


Open File Handles for Deleted sstables

2016-09-28 Thread Anuj Wadehra
Hi,
We are facing an issue where Cassandra has open file handles for deleted 
sstable files. These open file handles keep on increasing with time and 
eventually lead to disk crisis. This is visible via lsof command. 
There are no Exceptions in logs.We are suspecting a race condition where 
compactions/repairs and reads are done on same sstable. I have gone through few 
JIRAs but somehow not able to coorelate the issue with those tickets. 
We are using 2.0.14. OS is Red Hat Linux.
Any suggestions?

ThanksAnuj




Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Even when I set a lower request-timeout in order to trigger a timeout,
still no WARN or ERROR in the logs

On Wed, Sep 28, 2016 at 8:22 PM, George Sigletos 
wrote:

> Hi Joaquin,
>
> Unfortunately neither WARN nor ERROR found in the system logs across the
> cluster when executing truncate. Sometimes it executes immediately, other
> times it takes 25 seconds, given that I have connected with
> --request-timeout=30 seconds.
>
> The nodes are a bit busy compacting. On a freshly restarted cluster,
> truncate seems to work without problems.
>
> Some warnings that I see around that time but not exactly when executing
> truncate are:
> WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646
> SSTableWriter.java:241 - Compacting large partition
> system/hints:6f2c3b31-4975-470b-8f91-e706be89a83a (133819308 bytes
>
> Kind regards,
> George
>
> On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares <
> joaq...@thelastpickle.com> wrote:
>
>> Hi George,
>>
>> Try grepping for WARN and ERROR on the system.logs across all nodes when
>> you run the command. Could you post any of the recent stacktraces that you
>> see?
>>
>> Cheers,
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos > > wrote:
>>
>>> Thanks a lot for your reply.
>>>
>>> I understand that truncate is an expensive operation. But throwing a
>>> timeout while truncating a table that is already empty?
>>>
>>> A workaround is to set a high --request-timeout when connecting. Even 20
>>> seconds is not always enough
>>>
>>> Kind regards,
>>> George
>>>
>>>
>>> On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo 
>>> wrote:
>>>
 Truncate does a few things (based on version)
   truncate takes snapshots
   truncate causes a flush
   in very old versions truncate causes a schema migration.

 In newer versions like cassandra 3.4 you have this knob.

 # How long the coordinator should wait for truncates to complete
 # (This can be much longer, because unless auto_snapshot is disabled
 # we need to flush first so we can snapshot before removing the data.)
 truncate_request_timeout_in_ms: 6


 In older versions you can not control when this call will timeout, it
 is fairly normal that it does!


 On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos <
 sigle...@textkernel.nl> wrote:

> Hello,
>
> I keep executing a TRUNCATE command on an empty table and it throws
> OperationTimedOut randomly:
>
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
>
> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
> anybody come across the same error?
>
> Thanks,
> George
>
>

>>>
>>
>


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Hi Joaquin,

Unfortunately neither WARN nor ERROR found in the system logs across the
cluster when executing truncate. Sometimes it executes immediately, other
times it takes 25 seconds, given that I have connected with
--request-timeout=30 seconds.

The nodes are a bit busy compacting. On a freshly restarted cluster,
truncate seems to work without problems.

Some warnings that I see around that time but not exactly when executing
truncate are:
WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646 SSTableWriter.java:241
- Compacting large partition
system/hints:6f2c3b31-4975-470b-8f91-e706be89a83a (133819308 bytes

Kind regards,
George

On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares 
wrote:

> Hi George,
>
> Try grepping for WARN and ERROR on the system.logs across all nodes when
> you run the command. Could you post any of the recent stacktraces that you
> see?
>
> Cheers,
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos 
> wrote:
>
>> Thanks a lot for your reply.
>>
>> I understand that truncate is an expensive operation. But throwing a
>> timeout while truncating a table that is already empty?
>>
>> A workaround is to set a high --request-timeout when connecting. Even 20
>> seconds is not always enough
>>
>> Kind regards,
>> George
>>
>>
>> On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo 
>> wrote:
>>
>>> Truncate does a few things (based on version)
>>>   truncate takes snapshots
>>>   truncate causes a flush
>>>   in very old versions truncate causes a schema migration.
>>>
>>> In newer versions like cassandra 3.4 you have this knob.
>>>
>>> # How long the coordinator should wait for truncates to complete
>>> # (This can be much longer, because unless auto_snapshot is disabled
>>> # we need to flush first so we can snapshot before removing the data.)
>>> truncate_request_timeout_in_ms: 6
>>>
>>>
>>> In older versions you can not control when this call will timeout, it is
>>> fairly normal that it does!
>>>
>>>
>>> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos <
>>> sigle...@textkernel.nl> wrote:
>>>
 Hello,

 I keep executing a TRUNCATE command on an empty table and it throws
 OperationTimedOut randomly:

 cassandra@cqlsh> truncate test.mytable;
 OperationTimedOut: errors={}, last_host=cassiebeta-01
 cassandra@cqlsh> truncate test.mytable;
 OperationTimedOut: errors={}, last_host=cassiebeta-01

 Having a 3 node cluster running 2.1.14. No connectivity problems. Has
 anybody come across the same error?

 Thanks,
 George


>>>
>>
>


Re: New node block in autobootstrap

2016-09-28 Thread Jonathan Haddad
What was the reason?

On Wed, Sep 28, 2016 at 9:58 AM techpyaasa .  wrote:

> Very sorry...I got the reason for this issue..
> Please ignore.
>
>
> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . 
> wrote:
>
>> @Paulo
>>
>> We have done changes as you said
>> net.ipv4.tcp_keepalive_time=60
>> net.ipv4.tcp_keepalive_probes=3
>> net.ipv4.tcp_keepalive_intvl=10
>>
>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>> "phi_convict_threshold : 9".
>>
>> And once again recommissioned new data center (DC3)  , ran " nodetool
>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>> got exit without any exception.
>>
>> Please check logs below
>>
>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>> StorageService.java (line 914) rebuild from dc: IDC*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>> StreamResultFuture.java (line 87) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.75*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.132*
>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.75*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.133*
>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.132*
>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.133*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.167*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.78*
>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.167*
>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.78*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.126*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.191*
>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.126*
>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.191*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.168*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.169*
>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.168*
>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.169*
>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> 

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread Joaquin Casares
Hi George,

Try grepping for WARN and ERROR on the system.logs across all nodes when
you run the command. Could you post any of the recent stacktraces that you
see?

Cheers,

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos 
wrote:

> Thanks a lot for your reply.
>
> I understand that truncate is an expensive operation. But throwing a
> timeout while truncating a table that is already empty?
>
> A workaround is to set a high --request-timeout when connecting. Even 20
> seconds is not always enough
>
> Kind regards,
> George
>
>
> On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo 
> wrote:
>
>> Truncate does a few things (based on version)
>>   truncate takes snapshots
>>   truncate causes a flush
>>   in very old versions truncate causes a schema migration.
>>
>> In newer versions like cassandra 3.4 you have this knob.
>>
>> # How long the coordinator should wait for truncates to complete
>> # (This can be much longer, because unless auto_snapshot is disabled
>> # we need to flush first so we can snapshot before removing the data.)
>> truncate_request_timeout_in_ms: 6
>>
>>
>> In older versions you can not control when this call will timeout, it is
>> fairly normal that it does!
>>
>>
>> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos > > wrote:
>>
>>> Hello,
>>>
>>> I keep executing a TRUNCATE command on an empty table and it throws
>>> OperationTimedOut randomly:
>>>
>>> cassandra@cqlsh> truncate test.mytable;
>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>> cassandra@cqlsh> truncate test.mytable;
>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>>
>>> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
>>> anybody come across the same error?
>>>
>>> Thanks,
>>> George
>>>
>>>
>>
>


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Thanks a lot for your reply.

I understand that truncate is an expensive operation. But throwing a
timeout while truncating a table that is already empty?

A workaround is to set a high --request-timeout when connecting. Even 20
seconds is not always enough

Kind regards,
George


On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo 
wrote:

> Truncate does a few things (based on version)
>   truncate takes snapshots
>   truncate causes a flush
>   in very old versions truncate causes a schema migration.
>
> In newer versions like cassandra 3.4 you have this knob.
>
> # How long the coordinator should wait for truncates to complete
> # (This can be much longer, because unless auto_snapshot is disabled
> # we need to flush first so we can snapshot before removing the data.)
> truncate_request_timeout_in_ms: 6
>
>
> In older versions you can not control when this call will timeout, it is
> fairly normal that it does!
>
>
> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos 
> wrote:
>
>> Hello,
>>
>> I keep executing a TRUNCATE command on an empty table and it throws
>> OperationTimedOut randomly:
>>
>> cassandra@cqlsh> truncate test.mytable;
>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>> cassandra@cqlsh> truncate test.mytable;
>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>
>> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
>> anybody come across the same error?
>>
>> Thanks,
>> George
>>
>>
>


WARN Writing large partition for materialized views

2016-09-28 Thread Robert Sicoie
Hi guys,

I run a cluster with 5 nodes, cassandra version 3.0.5.

I get this warning:
2016-09-28 17:22:18,480 BigTableWriter.java:171 - Writing large partition...

for some materialized view. Some have values over 500MB. How this affects
performance? What can/should be done? I suppose is a problem in the schema
design.

Thanks,
Robert Sicoie


Contains-query leads to error when list in selected row is empty

2016-09-28 Thread Michael Mirwaldt
Hi Cassandra-users,
my name is Michael Mirwaldt and I work for financial.com.

I have encountered this problem with Cassandra 3.7 running 4 nodes:

Given the data model

CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '2'}  AND durable_writes = true;

CREATE TABLE mykeyspace.mytable (partitionkey text, mylist list, PRIMARY 
KEY (partitionkey));

If I add the value

INSERT INTO mykeyspace.mytable(partitionkey,mylist) VALUES('A',['1']);

and query

select * from mykeyspace.mytable;

I get

partitionkey | mylist
--+
A |  ['1']

and If I query

select * from mykeyspace.mytable where partitionkey='A' and mylist contains '1' 
allow filtering;

I get

partitionkey | mylist
--+
A |  ['1']


But if I add

INSERT INTO mykeyspace.mytable(partitionkey) VALUES('B');

so that

select * from mykeyspace.mytable;

gives me

partitionkey | mylist
--+
B |   null
A |  ['1']

then

select * from mykeyspace.mytable where partitionkey='B' and mylist contains '2' 
allow filtering;

leads to the error message

ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
failed - received 0 responses and 2 failures" info={'failures': 2, 
'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

with the log message

[...] AwareExecutorService$FutureTask.run|Uncaught exception on thread 
Thread[SharedPool-Worker-2,5,main]: java.lang.RuntimeException: 
java.lang.NullPointerException

on one other node.

Is that really logical and intended?
Would you not expect an empty result for last query?

I am confused.
Can you help me?

Brgds,
Michael



financial.com AG

Munich Head Office/Hauptsitz M?nchen: Georg-Muche-Stra?e 3 | 80807 Munich | 
Germany | Tel. +49 89 318528-0 | Google Maps: http://goo.gl/maps/UHwj9
Frankfurt Branch Office/Niederlassung Frankfurt: Messeturm | 
Friedrich-Ebert-Anlage 49 | 60327 Frankfurt am Main | Germany | Google Maps: 
http://goo.gl/maps/oSGjR
Management Board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. 
Yann Samson
Supervisory Board/Aufsichtsrat: Werner Engelhardt (Chairman/Vorsitzender), Eric 
Wasescha (Deputy Chairman/Stellv. Vorsitzender), Franz Baur
Register Court/Handelsregister: Munich - HRB 128972 | Sales Tax ID 
Number/St.Nr.: DE205370553


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread Edward Capriolo
Truncate does a few things (based on version)
  truncate takes snapshots
  truncate causes a flush
  in very old versions truncate causes a schema migration.

In newer versions like cassandra 3.4 you have this knob.

# How long the coordinator should wait for truncates to complete
# (This can be much longer, because unless auto_snapshot is disabled
# we need to flush first so we can snapshot before removing the data.)
truncate_request_timeout_in_ms: 6


In older versions you can not control when this call will timeout, it is
fairly normal that it does!


On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos 
wrote:

> Hello,
>
> I keep executing a TRUNCATE command on an empty table and it throws
> OperationTimedOut randomly:
>
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
> cassandra@cqlsh> truncate test.mytable;
> OperationTimedOut: errors={}, last_host=cassiebeta-01
>
> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
> anybody come across the same error?
>
> Thanks,
> George
>
>


Re: New node block in autobootstrap

2016-09-28 Thread techpyaasa .
Very sorry...I got the reason for this issue..
Please ignore.


On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa .  wrote:

> @Paulo
>
> We have done changes as you said
> net.ipv4.tcp_keepalive_time=60
> net.ipv4.tcp_keepalive_probes=3
> net.ipv4.tcp_keepalive_intvl=10
>
> and increased streaming_socket_timeout_in_ms to 48 hours ,
> "phi_convict_threshold : 9".
>
> And once again recommissioned new data center (DC3)  , ran " nodetool
> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
> got exit without any exception.
>
> Please check logs below
>
> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
> StorageService.java (line 914) rebuild from dc: IDC*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
> StreamResultFuture.java (line 87) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.75*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.132*
> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.75*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.133*
> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.132*
> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.133*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.167*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.78*
> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.167*
> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.78*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.126*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.191*
> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.126*
> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.191*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.168*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.169*
> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.168*
> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.169*
> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
> StreamResultFuture.java (line 186) [Stream
> 

Re: New node block in autobootstrap

2016-09-28 Thread techpyaasa .
@Paulo

We have done changes as you said
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10

and increased streaming_socket_timeout_in_ms to 48 hours ,
"phi_convict_threshold : 9".

And once again recommissioned new data center (DC3)  , ran " nodetool
rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
got exit without any exception.

Please check logs below

*INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
StorageService.java (line 914) rebuild from dc: IDC*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
StreamResultFuture.java (line 87) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.133*
* INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.133*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.167*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.78*
* INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.167*
* INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.78*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.126*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.191*
* INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.126*
* INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.191*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.168*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.169*
* INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.168*
* INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.169*
* INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) 

TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Hello,

I keep executing a TRUNCATE command on an empty table and it throws
OperationTimedOut randomly:

cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01
cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01

Having a 3 node cluster running 2.1.14. No connectivity problems. Has
anybody come across the same error?

Thanks,
George


Re: nodetool rebuild streaming exception

2016-09-28 Thread Alain RODRIGUEZ
Hi techpyaasa,

That was one of my teammate , very sorry for it/multiple threads.


No big deal :-).

*It looks like streams are failing right away when trying to rebuild.?*
> No , after partial streaming of data (around 150 GB - we have around 600
> GB of data on each node) streaming is getting failed with the above
> exception stack trace.


Yes I get confused, I meant to say that what happens is that the specific
session that fails, fails fast, it doesn't look like a timeout issue yet
there is a '*Connection timed out'.*

I am not sure to understand what is happening here.

Could you please share what 'nodetool status keyspace_name1' outputs (if
it's big just use gist or whatever)? If not make sure all the nodes are Up
with:

$ nodetool status | grep -v UN


> *It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?*Yes,
> I'm running it from DC3 using " nodetool rebuild 'DC1' " command  , after
> altering keyspace with RF : DC1:3 , DC2:3 , DC3:3 and we using Network
> Topology Strategy.


The command looks fine and now I know it actually worked for a while. If
you have many keyspace, some might work and at some point one of them could
fail. Keyspace 'keyspace_name1' looks like a test one. Are you sure on how
it is configured? If not feel free to paste here the keyspace configuration
as well (no need for the whole schema with tables details).

$ echo 'DESCRIBE KEYSPACE keyspace_name1;' | cqlsh 

As I said , 'streaming_socket_timeout_in_ms: 8640' to 24 hours.
>

Also have you done this on all the node and restarted them?

How long does the rebuild operation runs before failing?

I have no real idea on what's happening there, just trying to give you some
clues.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-09-28 15:18 GMT+02:00 techpyaasa . :

> @Alain
> That was one of my teammate , very sorry for it/multiple threads.
>
> *It looks like streams are failing right away when trying to rebuild.?*
> No , after partial streaming of data (around 150 GB - we have around 600
> GB of data on each node) streaming is getting failed with the above
> exception stack trace.
>
> *It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?*
> Yes, I'm running it from DC3 using " nodetool rebuild 'DC1' " command  ,
> after altering keyspace with RF : DC1:3 , DC2:3 , DC3:3 and we using Network
> Topology Strategy.
>
> Yes , all nodes are running on same c*-2.0.17 version.
>
> As I said , 'streaming_socket_timeout_in_ms: 8640' to 24 hours.
>
> As suggested in @Paul & in some blogs , we gonna re-try with following
> changes *on new nodes in DC3.*
>
>
>
>
> *net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3
> net.ipv4.tcp_keepalive_intvl=10*
> Hope these settings are enough on new nodes from where we are going to
> initiate rebuild/streaming and NOT required on all existing nodes from
> where we are getting data streamed. Am I right ??
>
> Have to see whether it works :( and btw ,you can please through a light on
> this if you have faced such exception in past.
>
> As I mentioned in my last mail, this is the exception we are getting in
> streaming AFTER STREAMING some data.
>
> *java.io.IOException: Connection timed out*
> *at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
> *at java.lang.Thread.run(Thread.java:745)*
> * INFO [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
> StreamResultFuture.java (line 186) [Stream
> #30852870-8472-11e6-b043-3f260c696828] Session with /xxx.xxx.198.191 is
> complete*
> *ERROR [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
> StreamSession.java (line 461) [Stream
> #30852870-8472-11e6-b043-3f260c696828] Streaming error occurred*
> *java.io.IOException: Broken pipe*
> *at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
Robert,

You can restart them in any order, that doesn't make a difference afaik.

Cheers

Le mer. 28 sept. 2016 17:10, Robert Sicoie  a
écrit :

> Thanks Alexander,
>
> Yes, with tpstats I can see the hanging active repair(s) (output
> attached). For one there are 31 pending repair. On others there are less
> pending repairs (min 12). Is there any recomandation for the restart order?
> The one with more less pending repairs first, perhaps?
>
> Thanks,
> Robert
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> They will show up in nodetool compactionstats :
>> https://issues.apache.org/jira/browse/CASSANDRA-9098
>>
>> Did you check nodetool tpstats to see if you didn't have any running
>> repair session ?
>> Just to make sure (and if you can actually do it), roll restart the
>> cluster and try again. Repair sessions can get sticky sometimes.
>>
>> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie 
>> wrote:
>>
>>> I am using nodetool compactionstats to check for pending compactions and
>>> it shows me 0 pending on all nodes, seconds before running nodetool repair.
>>> I am also monitoring PendingCompactions on jmx.
>>>
>>> Is there other way I can find out if is there any anticompaction running
>>> on any node?
>>>
>>> Thanks a lot,
>>> Robert
>>>
>>> Robert Sicoie
>>>
>>> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
 Robert,

 you need to make sure you have no repair session currently running on
 your cluster, and no anticompaction.
 I'd recommend doing a rolling restart in order to stop all running
 repair for sure, then start the process again, node by node, checking that
 no anticompaction is running before moving from one node to the other.

 Please do not use the -pr switch as it is both useless (token ranges
 are repaired only once with inc repair, whatever the replication factor)
 and harmful as all anticompactions won't be executed (you'll still have
 sstables marked as unrepaired even if the process has ran entirely with no
 error).

 Let us know how that goes.

 Cheers,

 On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie 
 wrote:

> Thanks Alexander,
>
> Now I started to run the repair with -pr arg and with keyspace and
> table args.
> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
> RepairRunnable.java:246 - Repair session
> 89af4d10-856f-11e6-b28f-df99132d7979 for range
> [(8323429577695061526,8326640819362122791],
> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
> 10.45.113.88"
>
> for one of the tables. 10.45.113.88 is the ip of the machine I am
> running the nodetool on.
> I'm wondering if this is normal...
>
> Thanks,
> Robert
>
>
>
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi,
>>
>> nodetool scrub won't help here, as what you're experiencing is most
>> likely that one SSTable is going through anticompaction, and then another
>> node is asking for a Merkle tree that involves it.
>> For understandable reasons, an SSTable cannot be anticompacted and
>> validation compacted at the same time.
>>
>> The solution here is to adjust the repair pressure on your cluster so
>> that anticompaction can end before you run repair on another node.
>> You may have a lot of anticompaction to do if you had high volumes of
>> unrepaired data, which can take a long time depending on several factors.
>>
>> You can tune your repair process to make sure no anticompaction is
>> running before launching a new session on another node or you can try my
>> Reaper fork that handles incremental repair :
>> https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>> I may have to add a few checks in order to avoid all collisions
>> between anticompactions and new sessions, but it should be helpful if you
>> struggle with incremental repair.
>>
>> In any case, check if your nodes are still anticompacting before
>> trying to run a new repair session on a node.
>>
>> Cheers,
>>
>>
>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <
>> robert.sic...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I have a cluster of 5 nodes, cassandra 3.0.5.
>>> I was running nodetool repair last days, one node at a time, when I
>>> first encountered this exception
>>>
>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>>> CassandraDaemon.java:195 - Exception in thread
>>> Thread[ValidationExecutor:11,1,main]*
>>> *java.lang.RuntimeException: Cannot start multiple repair 

[RELEASE] Apache Cassandra 2.2.8 released

2016-09-28 Thread Michael Shuler
* NOTICE *
This is the first release signed with key 0xA278B781FE4B2BDA by Michael
Shuler. Debian users will need to add the key to `apt-key` and the
process has been updated on
https://wiki.apache.org/cassandra/DebianPackaging and patch created for
source docs.

Either method will work:

curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -

or

sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key
0xA278B781FE4B2BDA

**

The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.2.8.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/pvdo31
[2]: (NEWS.txt) https://goo.gl/PbDAPY
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Robert Sicoie
Thanks Alexander,

Yes, with tpstats I can see the hanging active repair(s) (output attached).
For one there are 31 pending repair. On others there are less pending
repairs (min 12). Is there any recomandation for the restart order? The one
with more less pending repairs first, perhaps?

Thanks,
Robert

Robert Sicoie

On Wed, Sep 28, 2016 at 5:35 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> They will show up in nodetool compactionstats : https://issues.apache.org/
> jira/browse/CASSANDRA-9098
>
> Did you check nodetool tpstats to see if you didn't have any running
> repair session ?
> Just to make sure (and if you can actually do it), roll restart the
> cluster and try again. Repair sessions can get sticky sometimes.
>
> On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie 
> wrote:
>
>> I am using nodetool compactionstats to check for pending compactions and
>> it shows me 0 pending on all nodes, seconds before running nodetool repair.
>> I am also monitoring PendingCompactions on jmx.
>>
>> Is there other way I can find out if is there any anticompaction running
>> on any node?
>>
>> Thanks a lot,
>> Robert
>>
>> Robert Sicoie
>>
>> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Robert,
>>>
>>> you need to make sure you have no repair session currently running on
>>> your cluster, and no anticompaction.
>>> I'd recommend doing a rolling restart in order to stop all running
>>> repair for sure, then start the process again, node by node, checking that
>>> no anticompaction is running before moving from one node to the other.
>>>
>>> Please do not use the -pr switch as it is both useless (token ranges are
>>> repaired only once with inc repair, whatever the replication factor) and
>>> harmful as all anticompactions won't be executed (you'll still have
>>> sstables marked as unrepaired even if the process has ran entirely with no
>>> error).
>>>
>>> Let us know how that goes.
>>>
>>> Cheers,
>>>
>>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie 
>>> wrote:
>>>
 Thanks Alexander,

 Now I started to run the repair with -pr arg and with keyspace and
 table args.
 Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
 RepairRunnable.java:246 - Repair session 
 89af4d10-856f-11e6-b28f-df99132d7979
 for range [(8323429577695061526,8326640819362122791],
 ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
 10.45.113.88"

 for one of the tables. 10.45.113.88 is the ip of the machine I am
 running the nodetool on.
 I'm wondering if this is normal...

 Thanks,
 Robert




 Robert Sicoie

 On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
 a...@thelastpickle.com> wrote:

> Hi,
>
> nodetool scrub won't help here, as what you're experiencing is most
> likely that one SSTable is going through anticompaction, and then another
> node is asking for a Merkle tree that involves it.
> For understandable reasons, an SSTable cannot be anticompacted and
> validation compacted at the same time.
>
> The solution here is to adjust the repair pressure on your cluster so
> that anticompaction can end before you run repair on another node.
> You may have a lot of anticompaction to do if you had high volumes of
> unrepaired data, which can take a long time depending on several factors.
>
> You can tune your repair process to make sure no anticompaction is
> running before launching a new session on another node or you can try my
> Reaper fork that handles incremental repair : https://github.com/
> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
> I may have to add a few checks in order to avoid all collisions
> between anticompactions and new sessions, but it should be helpful if you
> struggle with incremental repair.
>
> In any case, check if your nodes are still anticompacting before
> trying to run a new repair session on a node.
>
> Cheers,
>
>
> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <
> robert.sic...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I have a cluster of 5 nodes, cassandra 3.0.5.
>> I was running nodetool repair last days, one node at a time, when I
>> first encountered this exception
>>
>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>> CassandraDaemon.java:195 - Exception in thread
>> Thread[ValidationExecutor:11,1,main]*
>> *java.lang.RuntimeException: Cannot start multiple repair sessions
>> over the same sstables*
>> * at
>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>> * at
>> 

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
They will show up in nodetool compactionstats :
https://issues.apache.org/jira/browse/CASSANDRA-9098

Did you check nodetool tpstats to see if you didn't have any running repair
session ?
Just to make sure (and if you can actually do it), roll restart the cluster
and try again. Repair sessions can get sticky sometimes.

On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie 
wrote:

> I am using nodetool compactionstats to check for pending compactions and
> it shows me 0 pending on all nodes, seconds before running nodetool repair.
> I am also monitoring PendingCompactions on jmx.
>
> Is there other way I can find out if is there any anticompaction running
> on any node?
>
> Thanks a lot,
> Robert
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Robert,
>>
>> you need to make sure you have no repair session currently running on
>> your cluster, and no anticompaction.
>> I'd recommend doing a rolling restart in order to stop all running repair
>> for sure, then start the process again, node by node, checking that no
>> anticompaction is running before moving from one node to the other.
>>
>> Please do not use the -pr switch as it is both useless (token ranges are
>> repaired only once with inc repair, whatever the replication factor) and
>> harmful as all anticompactions won't be executed (you'll still have
>> sstables marked as unrepaired even if the process has ran entirely with no
>> error).
>>
>> Let us know how that goes.
>>
>> Cheers,
>>
>> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie 
>> wrote:
>>
>>> Thanks Alexander,
>>>
>>> Now I started to run the repair with -pr arg and with keyspace and table
>>> args.
>>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>>> RepairRunnable.java:246 - Repair session
>>> 89af4d10-856f-11e6-b28f-df99132d7979 for range
>>> [(8323429577695061526,8326640819362122791],
>>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
>>> 10.45.113.88"
>>>
>>> for one of the tables. 10.45.113.88 is the ip of the machine I am
>>> running the nodetool on.
>>> I'm wondering if this is normal...
>>>
>>> Thanks,
>>> Robert
>>>
>>>
>>>
>>>
>>> Robert Sicoie
>>>
>>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
 Hi,

 nodetool scrub won't help here, as what you're experiencing is most
 likely that one SSTable is going through anticompaction, and then another
 node is asking for a Merkle tree that involves it.
 For understandable reasons, an SSTable cannot be anticompacted and
 validation compacted at the same time.

 The solution here is to adjust the repair pressure on your cluster so
 that anticompaction can end before you run repair on another node.
 You may have a lot of anticompaction to do if you had high volumes of
 unrepaired data, which can take a long time depending on several factors.

 You can tune your repair process to make sure no anticompaction is
 running before launching a new session on another node or you can try my
 Reaper fork that handles incremental repair :
 https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
 I may have to add a few checks in order to avoid all collisions between
 anticompactions and new sessions, but it should be helpful if you struggle
 with incremental repair.

 In any case, check if your nodes are still anticompacting before trying
 to run a new repair session on a node.

 Cheers,


 On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie 
 wrote:

> Hi guys,
>
> I have a cluster of 5 nodes, cassandra 3.0.5.
> I was running nodetool repair last days, one node at a time, when I
> first encountered this exception
>
> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
> CassandraDaemon.java:195 - Exception in thread
> Thread[ValidationExecutor:11,1,main]*
> *java.lang.RuntimeException: Cannot start multiple repair sessions
> over the same sstables*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_60]*
> * at
> 

Re: nodetool rebuild streaming exception

2016-09-28 Thread Alain RODRIGUEZ
Just saw a very similar question from Laxmikanth (laxmikanth...@gmail.com)
on an other thread, with the same logs.

Would you mind to avoid splitting multiple threads, to gather up
informations so we can better help you from this mailing list?

C*heers,

2016-09-28 14:28 GMT+02:00 Alain RODRIGUEZ :

> Hi,
>
> It looks like streams are failing right away when trying to rebuild.
>
>
>- Could you please share with us the command you used?
>
>
> It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?
>
>- Are all the nodes using the same version ('nodetool version')?
>- What does 'nodetool status keyspace_name1' output?
>- Are you sure to be using Network Topology Strategy on '*keyspace_name1'?
>*Have you modified this schema to add replications on DC3
>
> My guess is something could be wrong with the configuration.
>
> I checked with our network operations team , they have confirmed network
>> is stable and no network hiccups.
>> I have set 'streaming_socket_timeout_in_ms: 8640' (24 hours) as
>> suggested in datastax blog  - https://support.datastax.com
>> /hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-
>> of-streaming-errors-or-failures and ran 'nodetool rebuild' one node at a
>> time but was of NO USE . Still we are getting above exception.
>>
>
> This look correct to me, good you added this information, thanks.
>
> An other thought is I believe you need all the nodes to be up to have
> those streams working on the origin DC you use for your 'nodetool rebuild
> ' command.
>
> This look a bit weird, good luck.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> 2016-09-27 18:54 GMT+02:00 techpyaasa . :
>
>> Hi,
>>
>> I'm trying to add new data center - DC3 to existing c*-2.0.17 cluster
>> with 2 data centers DC1, DC2 with replication DC1:3 , DC2:3 , DC3:3.
>>
>>  I'm getting following exception repeatedly on new nodes after I run
>> 'nodetool rebuild'.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *DEBUG [ScheduledTasks:1] 2016-09-27 04:24:00,416 GCInspector.java (line
>> 118) GC for ParNew: 20 ms for 1 collections, 9837479688 used; max is
>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:03,417
>> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
>> 9871193904 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>> 04:24:06,418 GCInspector.java (line 118) GC for ParNew: 20 ms for 1
>> collections, 9950298136 used; max is 16760438784DEBUG [ScheduledTasks:1]
>> 2016-09-27 04:24:09,419 GCInspector.java (line 118) GC for ParNew: 19 ms
>> for 1 collections, 9941119568 used; max is 16760438784DEBUG
>> [ScheduledTasks:1] 2016-09-27 04:24:12,421 GCInspector.java (line 118) GC
>> for ParNew: 20 ms for 1 collections, 9864185024 used; max is
>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:15,422
>> GCInspector.java (line 118) GC for ParNew: 60 ms for 2 collections,
>> 9730374352 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>> 04:24:18,423 GCInspector.java (line 118) GC for ParNew: 18 ms for 1
>> collections, 9775448168 used; max is 16760438784DEBUG [ScheduledTasks:1]
>> 2016-09-27 04:24:21,424 GCInspector.java (line 118) GC for ParNew: 22 ms
>> for 1 collections, 9850794272 used; max is 16760438784DEBUG
>> [ScheduledTasks:1] 2016-09-27 04:24:24,425 GCInspector.java (line 118) GC
>> for ParNew: 20 ms for 1 collections, 9729992448 <9729992448> used; max is
>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:27,426
>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>> 9699783920 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>> 04:24:30,427 GCInspector.java (line 118) GC for ParNew: 21 ms for 1
>> collections, 9696523920 used; max is 16760438784DEBUG [ScheduledTasks:1]
>> 2016-09-27 04:24:33,429 GCInspector.java (line 118) GC for ParNew: 20 ms
>> for 1 collections, 9560497904 used; max is 16760438784DEBUG
>> [ScheduledTasks:1] 2016-09-27 04:24:36,430 GCInspector.java (line 118) GC
>> for ParNew: 19 ms for 1 collections, 9568718352 <9568718352> used; max is
>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:39,431
>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>> 9496991384 <9496991384> used; max is 16760438784DEBUG [ScheduledTasks:1]
>> 2016-09-27 04:24:42,432 GCInspector.java (line 118) GC for ParNew: 19 ms
>> for 1 collections, 9486433840 used; max is 16760438784DEBUG
>> [ScheduledTasks:1] 2016-09-27 04:24:45,434 GCInspector.java (line 118) GC
>> for ParNew: 19 ms for 1 collections, 9442642688 used; max is
>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:48,435
>> GCInspector.java (line 118) GC 

Re: nodetool rebuild streaming exception

2016-09-28 Thread Alain RODRIGUEZ
Hi,

It looks like streams are failing right away when trying to rebuild.


   - Could you please share with us the command you used?


It should be ran from DC3 servers, after altering keyspace to add keyspaces
to the new datacenter. Is this the way you're doing it?

   - Are all the nodes using the same version ('nodetool version')?
   - What does 'nodetool status keyspace_name1' output?
   - Are you sure to be using Network Topology Strategy on '*keyspace_name1'?
   *Have you modified this schema to add replications on DC3

My guess is something could be wrong with the configuration.

I checked with our network operations team , they have confirmed network is
> stable and no network hiccups.
> I have set 'streaming_socket_timeout_in_ms: 8640' (24 hours) as
> suggested in datastax blog  - https://support.datastax.com/
> hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-
> of-streaming-errors-or-failures and ran 'nodetool rebuild' one node at a
> time but was of NO USE . Still we are getting above exception.
>

This look correct to me, good you added this information, thanks.

An other thought is I believe you need all the nodes to be up to have those
streams working on the origin DC you use for your 'nodetool rebuild
' command.

This look a bit weird, good luck.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-09-27 18:54 GMT+02:00 techpyaasa . :

> Hi,
>
> I'm trying to add new data center - DC3 to existing c*-2.0.17 cluster with
> 2 data centers DC1, DC2 with replication DC1:3 , DC2:3 , DC3:3.
>
>  I'm getting following exception repeatedly on new nodes after I run
> 'nodetool rebuild'.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *DEBUG [ScheduledTasks:1] 2016-09-27 04:24:00,416 GCInspector.java (line
> 118) GC for ParNew: 20 ms for 1 collections, 9837479688 used; max is
> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:03,417
> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
> 9871193904 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
> 04:24:06,418 GCInspector.java (line 118) GC for ParNew: 20 ms for 1
> collections, 9950298136 used; max is 16760438784DEBUG [ScheduledTasks:1]
> 2016-09-27 04:24:09,419 GCInspector.java (line 118) GC for ParNew: 19 ms
> for 1 collections, 9941119568 used; max is 16760438784DEBUG
> [ScheduledTasks:1] 2016-09-27 04:24:12,421 GCInspector.java (line 118) GC
> for ParNew: 20 ms for 1 collections, 9864185024 used; max is
> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:15,422
> GCInspector.java (line 118) GC for ParNew: 60 ms for 2 collections,
> 9730374352 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
> 04:24:18,423 GCInspector.java (line 118) GC for ParNew: 18 ms for 1
> collections, 9775448168 used; max is 16760438784DEBUG [ScheduledTasks:1]
> 2016-09-27 04:24:21,424 GCInspector.java (line 118) GC for ParNew: 22 ms
> for 1 collections, 9850794272 used; max is 16760438784DEBUG
> [ScheduledTasks:1] 2016-09-27 04:24:24,425 GCInspector.java (line 118) GC
> for ParNew: 20 ms for 1 collections, 9729992448 <9729992448> used; max is
> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:27,426
> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
> 9699783920 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
> 04:24:30,427 GCInspector.java (line 118) GC for ParNew: 21 ms for 1
> collections, 9696523920 used; max is 16760438784DEBUG [ScheduledTasks:1]
> 2016-09-27 04:24:33,429 GCInspector.java (line 118) GC for ParNew: 20 ms
> for 1 collections, 9560497904 used; max is 16760438784DEBUG
> [ScheduledTasks:1] 2016-09-27 04:24:36,430 GCInspector.java (line 118) GC
> for ParNew: 19 ms for 1 collections, 9568718352 <9568718352> used; max is
> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:39,431
> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
> 9496991384 <9496991384> used; max is 16760438784DEBUG [ScheduledTasks:1]
> 2016-09-27 04:24:42,432 GCInspector.java (line 118) GC for ParNew: 19 ms
> for 1 collections, 9486433840 used; max is 16760438784DEBUG
> [ScheduledTasks:1] 2016-09-27 04:24:45,434 GCInspector.java (line 118) GC
> for ParNew: 19 ms for 1 collections, 9442642688 used; max is
> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:48,435
> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
> 9548532008 <9548532008> used; max is 16760438784DEBUG
> [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,756 ConnectionHandler.java
> (line 244) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File
> (Header (cfId: bf446a90-71c5-3552-a2e5-b1b94dbf86e3, #0, version: jb,
> estimated keys: 252928, transfer size: 5496759656, compressed?: true),
> file:
> 

Re: Repairs at scale in Cassandra 2.1.13

2016-09-28 Thread Paulo Motta
There were a few streaming bugs fixed between 2.1.13 and 2.1.15 (see
CHANGES.txt for more details), so I'd recommend you to upgrade to 2.1.15 in
order to avoid having those.

2016-09-28 9:08 GMT-03:00 Alain RODRIGUEZ :

> Hi Anubhav,
>
>
>> I’m considering doing subrange repairs (https://github.com/BrianGalle
>> w/cassandra_range_repair/blob/master/src/range_repair.py)
>>
>
> I used this script a lot, and quite successfully.
>
> An other working option that people are using is:
>
> https://github.com/spotify/cassandra-reaper
>
> Alexander, a coworker integrated an existing UI and made it compatible
> with incremental repairs:
>
> Incremental repairs on Reaper: https://github.com/
> adejanovski/cassandra-reaper/tree/inc-repair-that-works
> UI integration with incremental repairs on Reaper: https://github.com/
> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>
> as I’ve heard from folks that incremental repairs simply don’t work even
>> in 3.x (Yeah, that’s a strong statement but I heard that from multiple
>> folks at the Summit).
>>
>
> Alexander also did a talk about repairs at the Summit (including
> incremental repairs) and someone from Netflix also did a good one as well,
> not mentioning incremental repairs but with some benchmarks and tips to run
> repairs. You might want to check one of those (or both):
>
> https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk
>
> I believe they haven't been released by Datastax yet, they probably will
> sometime soon.
>
> Repair is something all the large setups companies are struggling with, I
> mean, Spotify made the Reaper and Netflix a talk about repairs presenting
> the range_repair.py script and much more stuff. But I know there is some
> work going on to improve things.
>
> Meanwhile, given the load per node (600 GB, it's big but not that huge)
> and the number of node (400 is quite a high number of nodes), I would say
> that the hardest part for you would be to handle the scheduling part to
> avoid harming the cluster and make sure all the nodes are repaired. I
> believe Reaper might be a better match in your case as it does that quite
> well from what I heard, I am not really sure.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-09-26 23:51 GMT+02:00 Anubhav Kale :
>
>> Hello,
>>
>>
>>
>> We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the
>> best way to run repairs at scale (400 nodes, each holding ~600GB) that
>> actually works ?
>>
>>
>>
>> I’m considering doing subrange repairs (https://github.com/BrianGalle
>> w/cassandra_range_repair/blob/master/src/range_repair.py) as I’ve heard
>> from folks that incremental repairs simply don’t work even in 3.x (Yeah,
>> that’s a strong statement but I heard that from multiple folks at the
>> Summit).
>>
>>
>>
>> Any guidance would be greatly appreciated !
>>
>>
>>
>> Thanks,
>>
>> Anubhav
>>
>
>


Re: Repairs at scale in Cassandra 2.1.13

2016-09-28 Thread Alain RODRIGUEZ
Hi Anubhav,


> I’m considering doing subrange repairs (https://github.com/
> BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py)
>

I used this script a lot, and quite successfully.

An other working option that people are using is:

https://github.com/spotify/cassandra-reaper

Alexander, a coworker integrated an existing UI and made it compatible with
incremental repairs:

Incremental repairs on Reaper:
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works
UI integration with incremental repairs on Reaper:
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui

as I’ve heard from folks that incremental repairs simply don’t work even in
> 3.x (Yeah, that’s a strong statement but I heard that from multiple folks
> at the Summit).
>

Alexander also did a talk about repairs at the Summit (including
incremental repairs) and someone from Netflix also did a good one as well,
not mentioning incremental repairs but with some benchmarks and tips to run
repairs. You might want to check one of those (or both):

https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk

I believe they haven't been released by Datastax yet, they probably will
sometime soon.

Repair is something all the large setups companies are struggling with, I
mean, Spotify made the Reaper and Netflix a talk about repairs presenting
the range_repair.py script and much more stuff. But I know there is some
work going on to improve things.

Meanwhile, given the load per node (600 GB, it's big but not that huge) and
the number of node (400 is quite a high number of nodes), I would say that
the hardest part for you would be to handle the scheduling part to avoid
harming the cluster and make sure all the nodes are repaired. I believe
Reaper might be a better match in your case as it does that quite well from
what I heard, I am not really sure.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-26 23:51 GMT+02:00 Anubhav Kale :

> Hello,
>
>
>
> We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the
> best way to run repairs at scale (400 nodes, each holding ~600GB) that
> actually works ?
>
>
>
> I’m considering doing subrange repairs (https://github.com/
> BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py) as
> I’ve heard from folks that incremental repairs simply don’t work even in
> 3.x (Yeah, that’s a strong statement but I heard that from multiple folks
> at the Summit).
>
>
>
> Any guidance would be greatly appreciated !
>
>
>
> Thanks,
>
> Anubhav
>


Re: How long/how many days 'nodetool gossipinfo' will have decommissioned nodes info

2016-09-28 Thread Alain RODRIGUEZ
>
> I've read from some  that the gossip info will stay
> around for 72h before being removed.
>

I've read this one too :-). It is 3 days indeed.

This might be of some interest:
https://issues.apache.org/jira/browse/CASSANDRA-10371 (Fix Version/s:
2.1.14, 2.2.6, 3.0.4, 3.4)

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-26 21:17 GMT+02:00 laxmikanth sadula :

> Thank you @Jaoquin and @DuyHai
>
> On Mon, Sep 26, 2016 at 10:00 PM, DuyHai Doan 
> wrote:
>
>> I've read from some  that the gossip info will stay
>> around for 72h before being removed.
>>
>> On Mon, Sep 26, 2016 at 6:19 PM, Joaquin Casares <
>> joaq...@thelastpickle.com> wrote:
>>
>>> Hello Techpyassa,
>>>
>>> Sometimes old gossip information tends to echo around for quite a bit
>>> longer than intended. I'm unsure how long the LEFT messages are supposed to
>>> be echoed for, but if you want to force the removal of a removed node from
>>> gossip, you can use the Assassinate Endpoint JMX command. On larger
>>> clusters, running this command synchronously across all machines may be
>>> required. Instructions on Assassinate Endpoint can be found here:
>>>
>>> https://gist.github.com/justenwalker/8338334
>>>
>>> If you're planning on recommissioning the same node, upon bootstrapping
>>> the gossiped message should change to a JOINING message overwriting the
>>> LEFT message.
>>>
>>> I've personally never checked `nodetool gossipinfo` before
>>> recommissioning a node and typically only ensure the node does not appear
>>> in `nodetool status`.
>>>
>>> Hope that helps,
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Sun, Sep 25, 2016 at 2:17 PM, Laxmikanth S 
>>> wrote:
>>>
 Hi,

 Recently we have decommissioned nodes from Cassandra cluster , but even
 after nearly 48 hours 'nodetool gossipinfo' still shows the removed nodes(
 as LEFT).

 I just wanted to recommission the same node again. So just wanted to
 know , will it create a problem if I recommission the same node(same IP)
  again while its state is maintained as LEFT in 'nodetool gossipnfo'.


 Thanks,
 Techpyaasa

>>>
>>>
>>
>
>
> --
> Regards,
> Laxmikanth
> 99621 38051
>
>


Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Alexander Dejanovski
Hi,

nodetool scrub won't help here, as what you're experiencing is most likely
that one SSTable is going through anticompaction, and then another node is
asking for a Merkle tree that involves it.
For understandable reasons, an SSTable cannot be anticompacted and
validation compacted at the same time.

The solution here is to adjust the repair pressure on your cluster so that
anticompaction can end before you run repair on another node.
You may have a lot of anticompaction to do if you had high volumes of
unrepaired data, which can take a long time depending on several factors.

You can tune your repair process to make sure no anticompaction is running
before launching a new session on another node or you can try my Reaper
fork that handles incremental repair :
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
I may have to add a few checks in order to avoid all collisions between
anticompactions and new sessions, but it should be helpful if you struggle
with incremental repair.

In any case, check if your nodes are still anticompacting before trying to
run a new repair session on a node.

Cheers,


On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie 
wrote:

> Hi guys,
>
> I have a cluster of 5 nodes, cassandra 3.0.5.
> I was running nodetool repair last days, one node at a time, when I first
> encountered this exception
>
> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
> CassandraDaemon.java:195 - Exception in thread
> Thread[ValidationExecutor:11,1,main]*
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>
> On some of the other boxes I see this:
>
>
> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair
> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
> [(-7505573573695693981,-7495786486761919991],*
> **
> * (-8483612809930827919,-8480482504800860871]]] Validation failed in
> /10.45.113.67 *
> * at
> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_60]*
> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]*
> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 CassandraDaemon.java:195
> - Exception in thread Thread[RepairJobTask:3,5,RMI Runtime]*
> *java.lang.AssertionError: java.lang.InterruptedException*
> * at
> org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:761)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:729)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56)
> ~[apache-cassandra-3.0.5.jar:3.0.5]*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_60]*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_60]*
> * at 

How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

2016-09-28 Thread Robert Sicoie
Hi guys,

I have a cluster of 5 nodes, cassandra 3.0.5.
I was running nodetool repair last days, one node at a time, when I first
encountered this exception

*ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
CassandraDaemon.java:195 - Exception in thread
Thread[ValidationExecutor:11,1,main]*
*java.lang.RuntimeException: Cannot start multiple repair sessions over the
same sstables*
* at
org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_60]*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_60]*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_60]*
* at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*

On some of the other boxes I see this:


*Caused by: org.apache.cassandra.exceptions.RepairException: [repair
#9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
[(-7505573573695693981,-7495786486761919991],*
**
* (-8483612809930827919,-8480482504800860871]]] Validation failed in
/10.45.113.67 *
* at
org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_60]*
* at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_60]*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_60]*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_60]*
* at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
*ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 CassandraDaemon.java:195 -
Exception in thread Thread[RepairJobTask:3,5,RMI Runtime]*
*java.lang.AssertionError: java.lang.InterruptedException*
* at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:761)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:729)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_60]*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_60]*
* at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
*Caused by: java.lang.InterruptedException: null*
* at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
~[na:1.8.0_60]*
* at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
~[na:1.8.0_60]*
* at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
~[na:1.8.0_60]*
* at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168)
~[apache-cassandra-3.0.5.jar:3.0.5]*
* ... 6 common frames omitted*


Now if I run nodetool repair I get the

*java.lang.RuntimeException: Cannot start multiple repair sessions over the
same sstables*

exception.
What do you suggest? would nodetool scrub or sstablescrub help in this
case. or it would just make it worse?

Thanks,

Robert