unsubscribe

2017-07-17 Thread Amit Singh F


Thanks & Regards
Amit Singh



Migrating to LCS : Disk Size recommendation clashes

2017-04-13 Thread Amit Singh F
Hi All,

We are in process of migrating from STCS to LCS and was just doing few reads on 
line . Below is the excerpt from Datastax recommendation on data size  :

Doc link : 
https://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningHardware.html

[cid:image004.png@01D2B47E.E29E7480][cid:image005.jpg@01D2B47E.E29E7480]

Also there is one more recommendation where it hints down to disk size can be 
limited to 10 TB (worst case) . Below is also excerpt also :

Doc link : 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

[cid:image007.png@01D2B47E.E29E7480][cid:image008.jpg@01D2B47E.E29E7480]

So are there any restrictions/scenarios due to which 600GB is the preferred one 
in LCS.

Thanks & Regards
Amit Singh



RE: Incremental Repair Migration

2017-01-09 Thread Amit Singh F
Hi Jonathan,

Really appreciate your response.

It will not be possible for us to move to Reaper as of now, we are in process 
to migrate to Incremental repair.

Also Running repair constantly will be costly affair in our case . For 
migrating to incremental repair with large set of dataset will take hours to be 
finished if we go ahead with procedure shared by Datastax.

So any quick method to reduce that ?

Regards
Amit Singh

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Tuesday, January 10, 2017 11:50 AM
To: user@cassandra.apache.org
Subject: Re: Incremental Repair Migration

Your best bet is to just run repair constantly. We maintain an updated fork of 
Spotify's reaper tool to help manage it: 
https://github.com/thelastpickle/cassandra-reaper
On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
<amit.f.si...@ericsson.com<mailto:amit.f.si...@ericsson.com>> wrote:
Hi All,

We are thinking of migrating from primary range repair (-pr) to incremental 
repair.

Environment :


• Cassandra 2.1.16
• 25 Node cluster ,
• RF 3
• Data size up to 450 GB per nodes

We found that running full repair will be taking around 8 hrs per node which 
means 200 odd hrs. for migrating the entire cluster to incremental repair. Even 
though there is zero downtime, it is quite unreasonable to ask for 200 hr 
maintenance window for migrating repairs.

Just want to know how Cassandra users in community optimize the procedure to 
reduce migration time ?

Thanks & Regards
Amit Singh


Incremental Repair Migration

2017-01-09 Thread Amit Singh F
Hi All,

We are thinking of migrating from primary range repair (-pr) to incremental 
repair.

Environment :


* Cassandra 2.1.16
* 25 Node cluster ,
* RF 3
* Data size up to 450 GB per nodes

We found that running full repair will be taking around 8 hrs per node which 
means 200 odd hrs. for migrating the entire cluster to incremental repair. Even 
though there is zero downtime, it is quite unreasonable to ask for 200 hr 
maintenance window for migrating repairs.

Just want to know how Cassandra users in community optimize the procedure to 
reduce migration time ?

Thanks & Regards
Amit Singh


RE: Incremental repair for the first time

2017-01-08 Thread Amit Singh F
Hi ,

Generally Upgradesstables are only recommended when you plan to move with Major 
version like  from 2.0 to 2.1  or from 2.1 to 2.2 etc. Since you are doing 
minor version upgrade no need to run upgradesstables utility.

Link by Datastax might be helpful to you :

https://support.datastax.com/hc/en-us/articles/208040036-Nodetool-upgradesstables-FAQ

From: Kathiresan S [mailto:kathiresanselva...@gmail.com]
Sent: Wednesday, January 04, 2017 12:22 AM
To: user@cassandra.apache.org
Subject: Re: Incremental repair for the first time

Thank you!

We are planning to upgrade to 3.0.10 for this issue.

From the NEWS txt file 
(https://github.com/apache/cassandra/blob/trunk/NEWS.txt), it looks like there 
is no need for sstableupgrade when we upgrade from 3.0.4 to 3.0.10 (i.e. Just 
installing 3.0.10 Cassandra would suffice and it will work with the sstables 
created by 3.0.4 ?)

Could you please confirm (if i'm reading the upgrade instructions correctly)?

Thanks,
Kathir

On Tue, Dec 20, 2016 at 5:28 PM, kurt Greaves 
> wrote:
No workarounds, your best/only option is to upgrade (plus you get the benefit 
of loads of other bug fixes).

On 16 December 2016 at 21:58, Kathiresan S 
> wrote:
Thank you!

Is any work around available for this version?

Thanks,
Kathir


On Friday, December 16, 2016, Jake Luciani 
> wrote:
This was fixed post 3.0.4 please upgrade to latest 3.0 release

On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S 
> wrote:
Hi,

We have a brand new Cassandra cluster (version 3.0.4) and we set up nodetool 
repair scheduled for every day (without any options for repair). As per 
documentation, incremental repair is the default in this case.
Should we do a full repair for the very first time on each node once and then 
leave it to do incremental repair afterwards?

Problem we are facing:

On a random node, the repair process throws validation failed error, pointing 
to some other node

For Eg. Node A, where the repair is run (without any option), throws below error

Validation failed in /Node B

In Node B when we check the logs, below exception is seen at the same exact 
time...

java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
 ~[apache-cassandra-3.0.4.jar:3.0.4]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
 ~[apache-cassandra-3.0.4.jar:3.0.4]
at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
 ~[apache-cassandra-3.0.4.jar:3.0.4]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_73]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_73]

Can you please help on how this can be fixed?

Thanks,
Kathir



--
http://twitter.com/tjake




RE: Trying to find cause of exception

2017-01-03 Thread Amit Singh F
5:03,132 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_111]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-3.3.0.jar:3.3.0]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard@aspect.com<mailto:richard@aspect.com>
aspect.com<http://www.aspect.com/>



From: Amit Singh F <amit.f.si...@ericsson.com<mailto:amit.f.si...@ericsson.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, January 2, 2017 at 4:34 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Trying to find cause of exception

Hello,

Few pointers :

a.)Can you check in system.log for similar msgs like “marking as down”  on 
the node which gives err msg if yes, then please check for GC pause . Heavy 
load is one of the reason for this.
b.)Can you try connecting cqlsh to that node once you get this kind of 
msgs. Are you able to connect?


Regards
Amit

From: Ney, Richard [mailto:richard@aspect.com]
Sent: Monday, January 02, 2017 3:30 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Trying to find cause of exception

My development team has been trying to track down the cause of this Read 
timeout (30 seconds or more at times) exception below. We’re running a 2 data 
center deployment with 3 nodes in each data center. Our tables are setup with 
replication factor = 2 and we have 16G dedicated to the heap with the G1GC for 
garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM 
and we have 2 general purpose EBS volumes on each node of 500GB each. Once we 
start getting these timeouts the cluster doesn’t recover and we are required to 
shut all Cassandra node down and restart. If anyone has any tips on where to 
look or what commands to run to help us diagnose this issue we’d be eternally 
grateful.

2017-01-02 04:33:35.161 [ERROR] 
[report-compute.ffbec924-ce44-11e6-9e21-0adb9d2dd624] [reportCompute] 
[ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312<http://ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312/>]
 [WorktypeMetrics] Persistence failure when replaying events for persistenceId 
[/fsms/pens/worktypes/bmwbpy.314]. Last known sequence number [0]
java.util.concurrent.ExecutionException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
akka.persistence.cassandra.package$$anon$1$$anonfun$run$1.apply(package.scala:17)
at scala.util.Try$.apply(Try.scala:192)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:115)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477)
at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:62)
at com.datastax.dr

RE: Trying to find cause of exception

2017-01-02 Thread Amit Singh F
Hello,

Few pointers :


a.) Can you check in system.log for similar msgs like “marking as down”  on 
the node which gives err msg if yes, then please check for GC pause . Heavy 
load is one of the reason for this.

b.)Can you try connecting cqlsh to that node once you get this kind of 
msgs. Are you able to connect?


Regards
Amit

From: Ney, Richard [mailto:richard@aspect.com]
Sent: Monday, January 02, 2017 3:30 PM
To: user@cassandra.apache.org
Subject: Trying to find cause of exception

My development team has been trying to track down the cause of this Read 
timeout (30 seconds or more at times) exception below. We’re running a 2 data 
center deployment with 3 nodes in each data center. Our tables are setup with 
replication factor = 2 and we have 16G dedicated to the heap with the G1GC for 
garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM 
and we have 2 general purpose EBS volumes on each node of 500GB each. Once we 
start getting these timeouts the cluster doesn’t recover and we are required to 
shut all Cassandra node down and restart. If anyone has any tips on where to 
look or what commands to run to help us diagnose this issue we’d be eternally 
grateful.

2017-01-02 04:33:35.161 [ERROR] 
[report-compute.ffbec924-ce44-11e6-9e21-0adb9d2dd624] [reportCompute] 
[ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312] [WorktypeMetrics] 
Persistence failure when replaying events for persistenceId 
[/fsms/pens/worktypes/bmwbpy.314]. Last known sequence number [0]
java.util.concurrent.ExecutionException: 
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
akka.persistence.cassandra.package$$anon$1$$anonfun$run$1.apply(package.scala:17)
at scala.util.Try$.apply(Try.scala:192)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:115)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477)
at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:62)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard@aspect.com
aspect.com

[mailSigLogo-rev.jpg]
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Handling Leap second delay

2016-12-21 Thread Amit Singh F
Hi ,

Attached conversation can be of some help to you.

Regards
Amit Singh

From: Sanjeev T [mailto:san...@gmail.com]
Sent: Wednesday, December 21, 2016 9:24 AM
To: user@cassandra.apache.org
Subject: Handling Leap second delay

Hi,

Can some of you share points on, the versions and handling leap second delay on 
Dec 31, 2016.

Regards
-Sanjeev

--- Begin Message ---
Based on most of what I've said previously pretty much most ways of avoiding 
your ordering issue of the leap second is going to be a "hack" and there will 
be some amount of hope involved.


If the updates occur more than 300ms apart and you are confident your nodes 
have clocks that are within 150ms of each other, then I'd close my eyes and 
hope they all leap second at the same time within that 150ms.


If they are less then 300ms (I'm guessing you meant less 300ms), then I would 
look to figure out what the smallest gap is between those two updates and make 
sure your nodes clocks are close enough in that gap that the leap second will 
occur on all nodes within that gap.

If that's not good enough, you could just halt those scenarios for 2 seconds 
over the leap second and then resume them once you've confirmed all clocks have 
skipped.


On Wed, 2 Nov 2016 at 18:13 Anuj Wadehra 
> wrote:


   Thanks Ben for taking out time for the detailed reply !!

   We dont need strict ordering for all operations but we are looking for 
scenarios where 2 quick updates to same column of same row are possible. By 
quick updates, I mean >300 ms. Configuring NTP properly (as mentioned in some 
blogs in your link) should give fair relative accuracy between the Cassandra 
nodes. But leap second takes the clock back for an ENTIRE one sec (huge) and 
the probability of old write overwriting the new one increases drastically. So, 
we want to be proactive with things.

   I agree that you should avoid such scebaruos with design (if possible).

   Good to know that you guys have setup your own NTP servers as per the 
recommendation. Curious..Do you also do some monitoring around NTP?



   Thanks
   Anuj


  On Fri, 28 Oct, 2016 at 12:25 AM, Ben Bromhead

  > wrote:
  If you need guaranteed strict ordering in a distributed system, I would 
not use Cassandra, Cassandra does not provide this out of the box. I would look 
to a system that uses lamport or vector clocks. Based on your description of 
how your systems runs at the moment (and how close your updates are together), 
you have either already experienced out of order updates or there is a real 
possibility you will in the future.

  Sorry to be so dire, but if you do require causal consistency / strict 
ordering, you are not getting it at the moment. Distributed systems theory is 
really tricky, even for people that are "experts" on distributed systems over 
unreliable networks (I would certainly not put myself in that category). People 
have made a very good name for themselves by showing that the vast majority of 
distributed databases have had bugs when it comes to their various consistency 
models and the claims these databases make.

  So make sure you really do need guaranteed causal consistency/strict 
ordering or if you can design around it (e.g. using conflict free replicated 
data types) or choose a system that is designed to provide it.

  Having said that... here are some hacky things you could do in Cassandra 
to try and get this behaviour, which I in no way endorse doing :)

  * Cassandra counters do leverage a logical clock per shard and you could 
hack something together with counters and lightweight transactions, but you 
would want to do your homework on counters accuracy during before diving into 
it... as I don't know if the implementation is safe in the context of your 
question. Also this would probably require a significant rework of your 
application plus a significant performance hit. I would invite a counter guru 
to jump in here...

  * You can leverage the fact that timestamps are monotonic if you isolate 
writes to a single node for a single shared... but you then loose Cassandra's 
availability guarantees, e.g. a keyspace with an RF of 1 and a CL of > ONE will 
get monotonic timestamps (if generated on the server side).

  * Continuing down the path of isolating writes to a single node for a 
given shard you could also isolate writes to the primary replica using your 
client driver during the leap second (make it a minute either side of the 
leap), but again you lose out on availability and you are probably already 
experiencing out of ordered writes given how close your writes and updates are.



  A note on NTP: NTP is generally fine if you use it to keep the clocks 
synced between the Cassandra nodes. If you are interested in how we have 
implemented NTP at Instaclustr, see our blogpost on it 

JConsole Support for SSL in C* 2.0

2016-10-12 Thread Amit Singh F
Hi All,

I was looking through the documentation of Security in C* 2.0, I noticed that 
there is no such mention of Jconsole over SSL whereas in latest 3.x doc, I can 
spot that :

http://docs.datastax.com/en/cassandra_win/3.0/cassandra/configuration/secureJconsoleSSL.html

so what I can infer from this is that only in C* 3.x, we can secure Jconsole 
over SSL?
Also in C* 2.0 , SSL can only be used by clients except nodetool,jconsole ?

Please correct me if I am on wrong way .

Regards
Amit Singh
Datastax Certified Developer


RE: C* files getting stuck

2016-09-30 Thread Amit Singh F
Hi All,

Please check if anybody has faced below issue and if yes what best can be done 
to avoid this.?
Thanks in advance.

Regards
Amit Singh

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Wednesday, June 29, 2016 3:52 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: C* files getting stuck


Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but 
still issue is there on higher version. Also in logs no error related to 
compaction is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


upgradesstables throws error when migrating from 2.0.14 to 2.1.13

2016-08-12 Thread Amit Singh F
Hi All,

We are in process of migrating from 2.0.14 to 2.1.13  and we are able to 
successfully install binaries and make Cassandra 2.1.13 running up and fine.
But issue comes up when we try to run nodetool upgradesstables , it gets 
finished in few seconds only which means it does not find any old sstables 
which needs to be upgraded but when I locate sstables on disk , I can see them 
in old state.

Also when I try running sstableupgrade command below error is thrown:

org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in 
variable: [cassandra.config].  Please prefix the file with file:/// for local 
files or file:/// for remote files. Aborting. If you are executing this 
from an external tool, it needs to set Config.setClientMode(true) to avoid 
loading configuration.
at 
org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:161)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:136)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.tools.StandaloneUpgrader.main(StandaloneUpgrader.java:52) 
[apache-cassandra-2.1.13.jar:2.1.13]
Expecting URI in variable: [cassandra.config].  Please prefix the file with 
file:/// for local files or file:/// for remote files. Aborting. If you 
are executing this from an external tool, it needs to set 
Config.setClientMode(true) to avoid loading configuration.
Fatal configuration error; unable to start. See log for stacktrace.

Also I debug in code little bit and this error is due to invalid path of 
Cassandra.yaml, but I can skip this as my Cassandra node in UN state.
So can anybody provide me some pointers to look into this.


Regards
Amit Chowdhery



RE: Exclude a host from the repair process

2016-07-20 Thread Amit Singh F
Hi Jean,

This option is available in C* version 2.1.x & above, where you can specify 
hosts in nodetool  repair command . For more detail please visit the link below 
:

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html

Regards
Amit Singh

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Wednesday, July 20, 2016 4:42 PM
To: user@cassandra.apache.org
Subject: Re: Exclude a host from the repair process

Hi Jean,

All the nodes are not necessary involved in a repair depending on vnodes being 
enabled or not, on your topology, on the racks you are using etc.

This being said, if a node was supposed to be part of a repair process, the 
repair of all the subranges including the down node will fail. That's what I 
have seen happening so far. @Stone Fang, not sure who is right on this (I might 
have missed some information about this topic), but there is a ticket about 
this topic: https://issues.apache.org/jira/browse/CASSANDRA-10446. You 
apparently can specify which nodes to repair, but a down node is not 
automatically ignored as far as I can tell.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-14 9:16 GMT+02:00 Stone Fang 
>:
dont think it is necessary to remove the down node.
the repair will continue comparing with other up node.ignore the down node.

On Wed, Jul 13, 2016 at 9:44 PM, Jean Carlo 
> wrote:
If a node is down in my cluster.

Is it possible to exclude him from the repair process in order to continue with 
the repair?
If not
Is the repair continue reparing the other replicas even if one is down?
Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay




RE: C* files getting stuck

2016-06-30 Thread Amit Singh F
Hi Josh,

On which version are you facing this issue. Is it 2.0.x branch ?

Regards
Amit
From: Josh Smith [mailto:josh.sm...@careerbuilder.com]
Sent: Thursday, June 30, 2016 7:39 PM
To: user@cassandra.apache.org
Subject: RE: C* files getting stuck

I have also faced this issue.  Rebooting the instance has been our fix so far.  
I am very interested if anyone else has a solution.  I was unable to get a 
definitive answer from Datastax during the last Cassandra Summit.

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Thursday, June 30, 2016 7:02 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: C* files getting stuck

Hi All,

Please check fi anybody has faced below issue and if yes what best can be done 
to avoid this.?
Thanks in advance.

Regards
Amit Singh

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Wednesday, June 29, 2016 3:52 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: C* files getting stuck


Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but 
still issue is there on higher version. Also in logs no error related to 
compaction is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


C* files getting stuck

2016-06-29 Thread Amit Singh F
Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to   
https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but still 
issue is there on higher version. Also in logs no error related to compaction 
is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


RE: *** How to bring up one of the Nodes which is down ***

2016-04-12 Thread Amit Singh F
Hi Lokesh,

Please check your Cassandra logs for the downed too and see any exception 
traces are there or not.

From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: Tuesday, April 12, 2016 3:59 PM
To: user@cassandra.apache.org
Subject: Re: *** How to bring up one of the Nodes which is down ***

Hi Lokesh,

This may sounds a bit silly but... What about starting Cassandra process in 
that box?

Regards,

Carlos Alonso | Software Engineer | @calonso

On 11 April 2016 at 19:16, Lokesh Ceeba - Vendor 
> wrote:
Team,
  Can you help ?  How to bring up one of the nodes below which 
is DOWN ?

[cassandra@rmtm-cassandra-db-103087499-2-111493402 ~]$ nodetool status
Datacenter: dev-cdc1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  OwnsHost ID   
Rack
UN  10.226.68.248  184.81 MB  256 ?   
3713213d-4a6f-4cea-b212-b3fec9dfaa92  AZ1
DN  10.226.68.252  186.15 MB  256 ?   
b3bf0781-cde4-466e-8725-3014023e6bf7  AZ1
UN  10.226.71.177  210.73 MB  256 ?   
d62cafa9-d21d-4683-bc71-eac55d6ecc30  AZ1


--
Lokesh
This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***



RE: Lot of GC on two nodes out of 7

2016-03-02 Thread Amit Singh F
Hi Anishek,

We too faced similar problem in 2.0.14 and after doing some research we config 
few parameters in Cassandra.yaml and was able to overcome GC pauses . Those are 
:


· memtable_flush_writers : increased from 1 to 3 as from tpstats output 
 we can see mutations dropped so it means writes are getting blocked, so 
increasing number will have those catered.

· memtable_total_space_in_mb : Default (1/4 of heap size), can lowered 
because larger long lived objects will create pressure on HEAP, so its better 
to reduce some amount of size.

· Concurrent_compactors : Alain righlty pointed out this i.e reduce it 
to 8. You need to try this.

Also please check whether you have mutations drop in other nodes or not.

Hope this helps in your cluster too.

Regards
Amit Singh
From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, March 02, 2016 9:33 PM
To: user@cassandra.apache.org
Subject: Re: Lot of GC on two nodes out of 7

Can you post a gist of the output of jstat -gccause (60 seconds worth)?  I 
think it's cool you're willing to experiment with alternative JVM settings but 
I've never seen anyone use max tenuring threshold of 50 either and I can't 
imagine it's helpful.  Keep in mind if your objects are actually reaching that 
threshold it means they've been copied 50x (really really slow) and also you're 
going to end up spilling your eden objects directly into your old gen if your 
survivor is full.  Considering the small amount of memory you're using for heap 
I'm really not surprised you're running into problems.

I recommend G1GC + 12GB heap and just let it optimize itself for almost all 
cases with the latest JVM versions.

On Wed, Mar 2, 2016 at 6:08 AM Alain RODRIGUEZ 
> wrote:
It looks like you are doing a good work with this cluster and know a lot about 
JVM, that's good :-).

our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM

That's good hardware too.

With 64 GB of ram I would probably directly give a try to `MAX_HEAP_SIZE=8G` on 
one of the 2 bad nodes probably.

Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using 
`-XX:MaxTenuringThreshold=15`, still on the canary node to observe the effects. 
But that's just an idea of something I would try to see the impacts, I don't 
think it will solve your current issues or even make it worse for this node.

Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would allow 
you to store the memtables off-heap. Those are 2 improvements reducing the heap 
pressure that you might be interested in.

I have spent time reading about all other options before including them and a 
similar configuration on our other prod cluster is showing good GC graphs via 
gcviewer.

So, let's look for an other reason.

there are MUTATION and READ messages dropped in high number on nodes in 
question and on other 5 nodes it varies between 1-3.

- Is Memory, CPU or disk a bottleneck? Is one of those running at the limits?

concurrent_compactors: 48

Reducing this to 8 would free some space for transactions (R requests). It is 
probably worth a try, even more when compaction is not keeping up and 
compaction throughput is not throttled.

Just found an issue about that: 
https://issues.apache.org/jira/browse/CASSANDRA-7139

Looks like `concurrent_compactors: 8` is the new default.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com






2016-03-02 12:27 GMT+01:00 Anishek Agarwal 
>:
Thanks a lot Alian for the details.
`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You might 
want to keep this as it or even reduce it if you have less than 16 GB of native 
memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so far. I 
had luck with values between 4 <--> 16 in the past. I would give  a try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ? Using 
default and then tune from there to improve things is generally a good idea.


we have a lot of reads and writes onto the system so keeping the high new size 
to make sure enough is held in memory including caches / memtables etc --number 
of flush_writers : 4 for us. similarly keeping less in old generation to make 
sure we spend less time with CMS GC most of the data is transient in memory for 
us. Keeping high TenuringThreshold because we don't want objects going to old 
generation and just die in young generation given we have configured large 
survivor spaces.
using occupancyFraction as 70 since
given heap is 4G
survivor space is : 400 mb -- 2 survivor spaces
70 % of 2G (old generation) = 1.4G

so once we are just below 1.4G and we have to move 

RE: Lot of GC on two nodes out of 7

2016-03-02 Thread Amit Singh F
Hi Anishek,

We too faced similar problem in 2.0.14 and after doing some research we config 
few parameters in Cassandra.yaml and was able to overcome GC pauses . Those are 
:


· memtable_flush_writers : increased from 1 to 3 as from tpstats output 
 we can see mutations dropped so it means writes are getting blocked, so 
increasing number will have those catered.

· memtable_total_space_in_mb : Default (1/4 of heap size), can lowered 
because larger long lived objects will create pressure on HEAP, so its better 
to reduce some amount of size.

· Concurrent_compactors : Alain righlty pointed out this i.e reduce it 
to 8. You need to try this.

Also please check whether you have mutations drop in other nodes or not.

Hope this helps in your cluster too.

Regards
Amit Singh

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, March 02, 2016 9:33 PM
To: user@cassandra.apache.org
Subject: Re: Lot of GC on two nodes out of 7

Can you post a gist of the output of jstat -gccause (60 seconds worth)?  I 
think it's cool you're willing to experiment with alternative JVM settings but 
I've never seen anyone use max tenuring threshold of 50 either and I can't 
imagine it's helpful.  Keep in mind if your objects are actually reaching that 
threshold it means they've been copied 50x (really really slow) and also you're 
going to end up spilling your eden objects directly into your old gen if your 
survivor is full.  Considering the small amount of memory you're using for heap 
I'm really not surprised you're running into problems.

I recommend G1GC + 12GB heap and just let it optimize itself for almost all 
cases with the latest JVM versions.

On Wed, Mar 2, 2016 at 6:08 AM Alain RODRIGUEZ 
> wrote:
It looks like you are doing a good work with this cluster and know a lot about 
JVM, that's good :-).

our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM

That's good hardware too.

With 64 GB of ram I would probably directly give a try to `MAX_HEAP_SIZE=8G` on 
one of the 2 bad nodes probably.

Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using 
`-XX:MaxTenuringThreshold=15`, still on the canary node to observe the effects. 
But that's just an idea of something I would try to see the impacts, I don't 
think it will solve your current issues or even make it worse for this node.

Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would allow 
you to store the memtables off-heap. Those are 2 improvements reducing the heap 
pressure that you might be interested in.

I have spent time reading about all other options before including them and a 
similar configuration on our other prod cluster is showing good GC graphs via 
gcviewer.

So, let's look for an other reason.

there are MUTATION and READ messages dropped in high number on nodes in 
question and on other 5 nodes it varies between 1-3.

- Is Memory, CPU or disk a bottleneck? Is one of those running at the limits?

concurrent_compactors: 48

Reducing this to 8 would free some space for transactions (R requests). It is 
probably worth a try, even more when compaction is not keeping up and 
compaction throughput is not throttled.

Just found an issue about that: 
https://issues.apache.org/jira/browse/CASSANDRA-7139

Looks like `concurrent_compactors: 8` is the new default.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com






2016-03-02 12:27 GMT+01:00 Anishek Agarwal 
>:
Thanks a lot Alian for the details.
`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You might 
want to keep this as it or even reduce it if you have less than 16 GB of native 
memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so far. I 
had luck with values between 4 <--> 16 in the past. I would give  a try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ? Using 
default and then tune from there to improve things is generally a good idea.


we have a lot of reads and writes onto the system so keeping the high new size 
to make sure enough is held in memory including caches / memtables etc --number 
of flush_writers : 4 for us. similarly keeping less in old generation to make 
sure we spend less time with CMS GC most of the data is transient in memory for 
us. Keeping high TenuringThreshold because we don't want objects going to old 
generation and just die in young generation given we have configured large 
survivor spaces.
using occupancyFraction as 70 since
given heap is 4G
survivor space is : 400 mb -- 2 survivor spaces
70 % of 2G (old generation) = 1.4G

so once we are 

RE: Gossip Protocol

2016-02-21 Thread Amit Singh F
Hello,

To get detailed description/Architecture of Gossip protocol please check the 
below link :

https://wiki.apache.org/cassandra/ArchitectureGossip

Also you can try nodetool gossipinfo command , output of this will guide you on 
details saved by gossip.

Regards
Amit Singh

From: Thouraya TH [mailto:thouray...@gmail.com]
Sent: Sunday, February 21, 2016 7:26 PM
To: user@cassandra.apache.org
Subject: Gossip Protocol

Hi all;
Please, where can i find what are the details saved by gossip protocol ?

Is it possible to add other informations to informations exchanged between 
nodes using gossip protcol  ?

Thank you so much.
Kind regards.


Upgrade from 2.0.x to 2.2.x documentation missing

2016-01-11 Thread Amit Singh F
Hi,

We are currently at Cassandra 2.0.14 in production and since it going to be EOL 
soon so we are planning to upgrade it to Cassandra 2.2.4 
(http://cassandra.apache.org/download/) which is the currently production ready 
version. While doing some analysis we found that there is no such entry of 2.2 
branch in datastax documentation 
(http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeC_c.html) 
which guides on how to reach 2.2.x from 2.0.x .

Can somebody guide us on the Upgrade path which needs to be followed while 
upgrading from 2.0.x to 2.2.x  .
Quick response will be highly appreciated. Thanks in advance


Regards
Amit Singh


unit test failing when pull is taken

2015-06-05 Thread Amit Singh F
Hi All,

I have taken pull from Cassandra branch and when I try to run test cases , they 
start failing(around 13 test suites are failing). Below are some traces of some 
test suite which are failing :


* [junit] Testsuite: 
org.apache.cassandra.db.compaction.BlacklistingCompactionsTest

  [junit] Testsuite: 
org.apache.cassandra.db.compaction.BlacklistingCompactionsTest

  [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 
sec

[junit] Testcase: 
org.apache.cassandra.db.compaction.BlacklistingCompactionsTest:testBlacklistingWithSizeTieredCompactionStrategy:
   Caused an ERROR

[junit] Timeout occurred. Please note the time in the report 
does not reflect the time until the timeout.

   [junit] junit.framework.AssertionFailedError: Timeout occurred. 
Please note the time in the report does not reflect the time until the timeout.



*  [junit] Testsuite: org.apache.cassandra.db.compaction.CompactionsTest
 [junit] Testsuite: org.apache.cassandra.db.compaction.CompactionsTest
 [junit] Tests run: 1, Failures: 0, Errors: 
1, Time elapsed: 0 sec

[junit] Testcase: 
org.apache.cassandra.db.compaction.CompactionsTest:testDontPurgeAccidentaly:
Caused an ERROR
[junit] Timeout occurred. Please note the time in the report does not 
reflect the time until the timeout.
   [junit] 
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in 
the report does not reflect the time until the timeout.

Pull is taken and then ant test is fired. But after 32 mins below error was 
encountered and it stops executing. Interestingly when I try to run 
individually each test case suite, it ran successfully .

BUILD FAILED
/home/cassandra/build.xml:1076: The following error occurred while executing 
this line:
/home/cassandra/build.xml:1035: Some unit test(s) failed.

Could anybody please provide any possible solution for this or some pointers to 
get it done.

Regards
Amit Singh