Re: Is it okay to use a small t2.micro instance for OpsCenter and use m3.medium instances for the actual Cassandra nodes?

2015-06-26 Thread arun sirimalla
Hi Sid,

I would recommend you to use either c3s or m3s instances for Opscenter and
for Cassandra nodes it depends on your use case.
You can go with either c3s or i2s for Cassandra nodes. But i would
recommend you to run performance tests before selecting the instance type.
If your use case requires more CPU i would recommend  c3s.

On Fri, Jun 26, 2015 at 1:20 PM, Sid Tantia sid.tan...@baseboxsoftware.com
wrote:

  Hello, I haven’t been able to find any documentation for best practices
 on this…is it okay to set up opscenter as a smaller node than the rest of
 the cluster.

 For instance, on AWS can I have 3 m3.medium nodes for Cassandra and 1
 t2.micro node for OpsCenter?




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Read Consistency

2015-06-23 Thread arun sirimalla
Scenario 1: Read query is fired for a key, data is found on one node and
not found on other two nodes who are responsible for the token
corresponding to key.

You read query will fail, as it expects to receive data from 2 nodes with
RF=3


Scenario 2: Read query is fired and all 3 replicas have different data with
different timestamps.

Read query will return the data with most recent timestamp and trigger a
read repair in the backend .

On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Hi,

 Need to validate my understanding..

 RF=3 , Read CL = Quorum

 What would be returned to the client in following scenarios:

 Scenario 1: Read query is fired for a key, data is found on one node and
 not found on other two nodes who are responsible for the token
 corresponding to key.

 Options: no data is returned OR data from the only node having data is
 returned?

 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Options: data with latest timestamp is returned OR something else???

 Thanks
 Anuj

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android




-- 
Arun


Re: Read Consistency

2015-06-23 Thread arun sirimalla
Thanks good to know that.

On Tue, Jun 23, 2015 at 11:27 AM, Philip Thompson 
philip.thomp...@datastax.com wrote:

 Yes, that is what he means. CL is for how many nodes need to respond, not
 agree.

 On Tue, Jun 23, 2015 at 2:26 PM, arun sirimalla arunsi...@gmail.com
 wrote:

 So do you mean with CL set to QUORUM, if data is only on one node, the
 query still succeeds.

 On Tue, Jun 23, 2015 at 11:21 AM, Philip Thompson 
 philip.thomp...@datastax.com wrote:

 Anuj,

 In the first scenario, the data from the single node holding data is
 returned. The query will not fail if the consistency level is met, even if
 the read was inconsistent.

 On Tue, Jun 23, 2015 at 2:16 PM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Why would it fail and with what Thrift error? What if the data didnt
 exist on any of the nodes..query wont fail if doesnt find data..

 Not convinced..

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android
 --
   *From*:arun sirimalla arunsi...@gmail.com
 *Date*:Tue, 23 Jun, 2015 at 11:39 pm
 *Subject*:Re: Read Consistency

 Scenario 1: Read query is fired for a key, data is found on one node
 and not found on other two nodes who are responsible for the token
 corresponding to key.

 You read query will fail, as it expects to receive data from 2 nodes
 with RF=3


 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Read query will return the data with most recent timestamp and trigger
 a read repair in the backend .

 On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Hi,

 Need to validate my understanding..

 RF=3 , Read CL = Quorum

 What would be returned to the client in following scenarios:

 Scenario 1: Read query is fired for a key, data is found on one node
 and not found on other two nodes who are responsible for the token
 corresponding to key.

 Options: no data is returned OR data from the only node having data is
 returned?

 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Options: data with latest timestamp is returned OR something else???

 Thanks
 Anuj

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android




 --
 Arun





 --
 Arun
 Senior Hadoop/Cassandra Engineer
 Cloudwick


 2014 Data Impact Award Winner (Cloudera)

 http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html





-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Read Consistency

2015-06-23 Thread arun sirimalla
So do you mean with CL set to QUORUM, if data is only on one node, the
query still succeeds.

On Tue, Jun 23, 2015 at 11:21 AM, Philip Thompson 
philip.thomp...@datastax.com wrote:

 Anuj,

 In the first scenario, the data from the single node holding data is
 returned. The query will not fail if the consistency level is met, even if
 the read was inconsistent.

 On Tue, Jun 23, 2015 at 2:16 PM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Why would it fail and with what Thrift error? What if the data didnt
 exist on any of the nodes..query wont fail if doesnt find data..

 Not convinced..

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android
 --
   *From*:arun sirimalla arunsi...@gmail.com
 *Date*:Tue, 23 Jun, 2015 at 11:39 pm
 *Subject*:Re: Read Consistency

 Scenario 1: Read query is fired for a key, data is found on one node and
 not found on other two nodes who are responsible for the token
 corresponding to key.

 You read query will fail, as it expects to receive data from 2 nodes with
 RF=3


 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Read query will return the data with most recent timestamp and trigger a
 read repair in the backend .

 On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Hi,

 Need to validate my understanding..

 RF=3 , Read CL = Quorum

 What would be returned to the client in following scenarios:

 Scenario 1: Read query is fired for a key, data is found on one node and
 not found on other two nodes who are responsible for the token
 corresponding to key.

 Options: no data is returned OR data from the only node having data is
 returned?

 Scenario 2: Read query is fired and all 3 replicas have different data
 with different timestamps.

 Options: data with latest timestamp is returned OR something else???

 Thanks
 Anuj

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android




 --
 Arun





-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: nodetool repair

2015-06-19 Thread arun sirimalla
Yes compactions will remove tombstones

On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay 
jean.tremb...@zen-innovations.com wrote:

  Perfect thank you.
 So making a weekly nodetool repair -pr”  on all nodes one after the other
 will repair my cluster. That is great.

  If it does a compaction, does it mean that it would also clean up my
 tombstone from my LeveledCompactionStrategy tables at the same time?

  Thanks for your help.

  On 19 Jun 2015, at 07:56 , arun sirimalla arunsi...@gmail.com wrote:

  Hi Jean,

  Running nodetool repair on a node will repair only that node in the
 cluster. It is recommended to run nodetool repair on one node at a time.

  Few things to keep in mind while running repair
1. Running repair will trigger compactions
2. Increase in CPU utilization.


  Run node tool repair with -pr option, so that it will repair only the
 range that node is responsible for.

 On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay 
 jean.tremb...@zen-innovations.com wrote:

 Thanks Jonathan.

  But I need to know the following:

  If you issue a “nodetool repair” on one node will it repair all the
 nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool
 repair ends, and only then issue another “nodetool repair” on the next node?

  Kind regards

  On 18 Jun 2015, at 19:19 , Jonathan Haddad j...@jonhaddad.com wrote:

  If you're using DSE, you can schedule it automatically using the repair
 service.  If you're open source, check out Spotify cassandra reaper, it'll
 manage it for you.

  https://github.com/spotify/cassandra-reaper



  On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay 
 jean.tremb...@zen-innovations.com wrote:

 Hi,

 I want to make on a regular base repairs on my cluster as suggested by
 the documentation.
 I want to do this in a way that the cluster is still responding to read
 requests.
 So I understand that I should not use the -par switch for that as it
 will do the repair in parallel and consume all available resources.

 If you issue a “nodetool repair” on one node will it repair all the
 nodes in the cluster or only the one on which we issue the command?

 If it repairs only one node, do I have to wait that the nodetool repair
 ends, and only then issue another “nodetool repair” on the next node?

 If we had down time periods I would issue a nodetool -par, but we don’t
 have down time periods.

 Sorry for the stupid questions.
 Thanks for your help.





  --
 Arun
 Senior Hadoop/Cassandra Engineer
 Cloudwick


  2014 Data Impact Award Winner (Cloudera)

 http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html





-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: nodetool repair

2015-06-18 Thread arun sirimalla
Hi Jean,

Running nodetool repair on a node will repair only that node in the
cluster. It is recommended to run nodetool repair on one node at a time.

Few things to keep in mind while running repair
   1. Running repair will trigger compactions
   2. Increase in CPU utilization.


Run node tool repair with -pr option, so that it will repair only the range
that node is responsible for.

On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay 
jean.tremb...@zen-innovations.com wrote:

  Thanks Jonathan.

  But I need to know the following:

  If you issue a “nodetool repair” on one node will it repair all the
 nodes in the cluster or only the one on which we issue the command?

If it repairs only one node, do I have to wait that the nodetool
 repair ends, and only then issue another “nodetool repair” on the next node?

  Kind regards

  On 18 Jun 2015, at 19:19 , Jonathan Haddad j...@jonhaddad.com wrote:

  If you're using DSE, you can schedule it automatically using the repair
 service.  If you're open source, check out Spotify cassandra reaper, it'll
 manage it for you.

  https://github.com/spotify/cassandra-reaper



  On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay 
 jean.tremb...@zen-innovations.com wrote:

 Hi,

 I want to make on a regular base repairs on my cluster as suggested by
 the documentation.
 I want to do this in a way that the cluster is still responding to read
 requests.
 So I understand that I should not use the -par switch for that as it will
 do the repair in parallel and consume all available resources.

 If you issue a “nodetool repair” on one node will it repair all the nodes
 in the cluster or only the one on which we issue the command?

 If it repairs only one node, do I have to wait that the nodetool repair
 ends, and only then issue another “nodetool repair” on the next node?

 If we had down time periods I would issue a nodetool -par, but we don’t
 have down time periods.

 Sorry for the stupid questions.
 Thanks for your help.





-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: EC2snitch in AWS

2015-05-27 Thread arun sirimalla
Hi Kaushal,

Here is the reference,
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html

On Wed, May 27, 2015 at 9:31 AM, Kaushal Shriyan kaushalshri...@gmail.com
wrote:

 Hi,

 Can somebody please share me details about setting up of EC2snitch in AWS
 single region which has availability zone 1a and 1b? I am using Cassandra
 version 1.2.19 in the setup.

 I would appreciate your help.

 Regards,

 Kaushal




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick


2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Unexpected behavior after adding successffully a new node

2015-05-12 Thread arun sirimalla
Analia,

Try running repair on node 3.

On Tue, May 12, 2015 at 7:39 AM, Analia Lorenzatto 
analialorenza...@gmail.com wrote:

 Hello guys,


 I have a cluster 2.1.0-2 comprised of 3 nodes.  The replication factor=2.
 We successfully added the third node last week.  After that, We ran clean
 ups on one node at that time.  Then we ran repairs on all the nodes, and
 finally compactions on all the CFs.

 Last night, I noticed the cluster started behaving in a weird way.  The
 last node (successfully added last week) were being reported up and down
 all the time.  I could see a lot of messages like this on logs:

 WARN  [SharedPool-Worker-33] 2015-05-11 21:31:45,125
 AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread
 Thread[SharedPool-Worker-33,5,main]: {}
 java.lang.RuntimeException: java.io.FileNotFoundException:
 /mnt/cassandra/data/matchings-85b4929048e211e4a949a3ed319cbedc/matchings-ka-3914-Data.db
 (No such file or directory)

 At the same time the consumption of heap used was on the top, up to the
 point the rest of the cluster saw this node as down.  After that, I just
 restarted the cassandra service with no problems on that node.

 Now, I can see the three nodes on the cluster Up and Normal, but this last
 node (which was rebooted) does not have data.  But it has all the structure
 of cassandra data.

 I can query against the new node and I get the same result as if do the
 query against the others nodes.  But, on this new node I do not have any
 SStables:



 root@prd-rtbkit-cassandra-03:/var/log/cassandra# nodetool status
 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  10.0.0.a  390.28 GB  256 66.7%
 eed9e9f5-f279-4b2f-b521-c056cbf65b52  1c
 UN  10.0.0.b  382.36 GB  256 68.3%
 19492c26-4458-4a0b-af04-72e0aab6598e  1c
 UN  10.0.0.c  40.61 MB   256 64.9%
 b8da952c-24b3-444a-a34e-7a1804eee6e6  1c

 What do you recommend to do? Leave this as if, remove it and try to join
 this or a new one?
 Thanks in advance!!

 --
 Saludos / Regards.

 Analía Lorenzatto.

 “It's possible to commit no errors and still lose. That is not weakness.
 That is life.  By Captain Jean-Luc Picard.




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread arun sirimalla
Yes, Cassandra nodes accept writes during Repair. Also Repair triggers
compactions to remove any tombstones.

On Thu, May 7, 2015 at 9:31 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.gov wrote:

 I was not able to find a conclusive answer to this question on the
 internet so I am asking this question here.
 Is a Cassandra node able to accept insert or delete operations while the
 node is being repaired?
 Thanks
 -Razi




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: calculation of disk size

2015-04-29 Thread arun sirimalla
Hi Rahul,

If you are expecting 15 GB of data per day, here is the calculation.

1 Day = 15 GB, 1 Month = 450 GB, 1 Year = 5.4 TB, so your raw data size for
one year is 5.4 TB with replication factor of 3 it would be around 16.2 TB
of data for one year.

Taking compaction into consideration and your use case being write heavy,
if you go with size tiered compaction. you would need twice the space of
your raw data.

So you would need around 32-34 TB of disk space.

Reference:
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architecturePlanningDiskCapacity_t.html

Thanks

On Wed, Apr 29, 2015 at 9:20 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.com wrote:

 Hi All,


 We are planning to set up a cluster of 5 nodes with RF 3 for write heavy
 project, our current database size is around 500 GB. And it is growing at
 rate of 15 GB every day. We learnt that cassandra consumes space for
 compaction processes, So how can we calculate the amount of disk space we
 would require.

 Kindly suggest.



 Regards:
 Rahul Bhardwaj


 Follow IndiaMART.com http://www.indiamart.com for latest updates on
 this and more: https://plus.google.com/+indiamart
 https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART
 Mobile Channel:
 https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8
 https://play.google.com/store/apps/details?id=com.indiamart.m
 http://m.indiamart.com/

 https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2
 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business.
 kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=cy1jiNXrzxc
 !!!




-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Best Practice to add a node in a Cluster

2015-04-27 Thread arun sirimalla
Hi Neha,


After you add the node to the cluster, run nodetool cleanup on all nodes.
Next running repair on each node will replicate the data. Make sure you run
the repair on one node at a time, because repair is an expensive process
(Utilizes high CPU).




On Mon, Apr 27, 2015 at 8:36 PM, Neha Trivedi nehajtriv...@gmail.com
wrote:

 Thanks Eric and Matt :) !!

 Yes the purpose is to improve reliability.
 Right now, from our driver we are querying using degradePolicy for
 reliability.



 *For changing the keyspace for RF=3, the procedure is as under:*
 1. Add a new node to the cluster (new node is not in seed list)

 2. ALTER KEYSPACE system_auth WITH REPLICATION =
   {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};


1. On each affected node, run nodetool repair

 http://docs.datastax.com/en/cassandra/1.2/cassandra/tools/toolsNodetool_r.html.

2. Wait until repair completes on a node, then move to the next node.


 Any other things to take care?

 Thanks
 Regards
 neha


 On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens migh...@gmail.com wrote:

 It depends on why you're adding a new node.  If you're running out of
 disk space or IO capacity in your 2 node cluster, then changing RF to 3
 will not improve either condition - you'd still be writing all data to all
 three nodes.

 However if you're looking to improve reliability, a 2 node RF=2 cluster
 cannot have either node offline without losing quorum, while a 3 node RF=3
 cluster can have one node offline and still be able to achieve quorum.
 RF=3 is a common replication factor because of this characteristic.

 Make sure your new node is not in its own seeds list, or it will not
 bootstrap (it will come online immediately and start serving requests).

 On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hi
 We have a 2 Cluster Node with RF=2. We are planing to add a new node.

 Should we change RF to 3 in the schema?
 OR Just added a new node with the same RF=2?

 Any other Best Practice that we need to take care?

 Thanks
 regards
 Neha






-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


High Compactions Pending

2014-09-22 Thread arun sirimalla
I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I
notice high compaction pending on one of the node around 35.
Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is
 much appreciated.

-- 
Arun
Senior Hadoop Engineer
Cloudwick

Champion of Big Data
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html