Re: Multinode Cassandra and sstableloader

2015-04-02 Thread Serega Sheypak
So, sstableloader streams a portion of data stored in
/var/lib/cassandra/data/keyspace/table catalog
If we have 3 nodes and RF=3, then only 1/3 of data would be streamed to
other cluster.
Problem is solved.


2015-04-01 12:05 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

 From Michael Laing - posted on the wrong thread :

 We use Alain's solution as well to make major operational revisions.

 We have a red team and a blue team in each AWS region, so we just add
 and drop datacenters to get where we want to be.

 Pretty simple.

 2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

 IMHO, the most straight forward solution is to add cluster2 as a new DC
 for mykeyspace and then drop the old DC.

 That's how we migrated to VPC (AWS) and we love this approach since you
 don't have to mess with your existing cluster, plus sync is made
 automatically and you can then drop your old DC safely, when you are sure.

 I put steps on this ML long time ago:
 https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
 Also Datastax docs:
 https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

 get data from cluster1,
 put it to cluster2
 wipe cluster1

 I would definitely use this method to do this (I actually did already,
 multiple times).

 Up to you, I heard once that there is almost as much way of doing
 operational on Cassandra as the number of operators :). You should go with
 method you can be confident with. I can assure the one I propose is quite
 secure.

 C*heers,

 Alain

 2015-03-31 15:32 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:

 I have to ask you if you considered doing an Alter keyspace, change RF
 The idea is dead simple:
 get data from cluster1,
 put it to cluster2
 vipe cluster1

 I understand drawbacks of streaming sstableloader approach, I need right
 now something easy. Later we consider switch to Priam since it does
 backup/restore in a right way.

 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

 Hi,

 Despite of I understand that it's not the best solution, I need it
 for testing purposes, I have to ask you if you considered doing an Alter
 keyspace, change RF  1 for mykeyspace on cluster2 and nodetool rebuild
 to add a new DC (your cluster2) ?

 In the case you go your way (sstableloader) also advice you to make a
 snapshot (instead of just flushing) to avoid fails due to compactions on
 your active cluster1.

 To answer your question, sstableloader is supposed to distribute
 correctly data on the new cluster depending on your RF and topology.
 Basically if you run sstable loader just on sstable c1.node1 my guess
 is that you will have all the data present on c1.node1 stored on the new c2
 (each data to corresponding node). So if you have an RF=3 on c1, you should
 have all the data on c2 just by running sstableloader from c1.node1, if you
 are using RF=1 on c1, then you need to load data from c1.each_node. I
 suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

 I never used the tool, but that's what would be logical imho. Wait
 for a confirmation as I wouldn't to lead you to a failure of any kind.
 Also, I don't know if data is also replicated directly with sstableloader
 or if you need to repair c2 after loading data.

 C*heers,

 Alain

 2015-03-31 13:21 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:

  Hi, I have a simple question and can't find related info in docs.

 I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
 transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
 using sstableloader. I understand that it's not the best solution, I need
 it for testing purposes.

 What I'm going to do:

1. Recreate keyspace schema on cluster2 using schema from cluster1
2. nodetool flush for mykeyspace.source_table being exported from
cluster1 to cluster2
3.

Run sstableloader for each table on cluster1.node01

sstableloader -d cluster2.nodeXXX.com

 /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

 What should I get as a result on cluster2?

 *ALL* data from source_table?

 or

 Just data stored in *partition of source_table*

 I'm confused. Doc says I just run this command to export table from
 cluster1 to cluster2, but I specify path to a part of source_table data,
 since other parts of table should be on other nodes.








Exception while running cassandra stress client

2015-04-02 Thread ankit tyagi
Hi All,

while running cassandra stress tool shipped with cassandra 2.0.4 version, i
am getting following error

*./bin/cassandra-stress user profile=./bin/test.yaml*
*Application does not allow arbitrary arguments: user,
profile=./bin/test.yaml*

I am stuck on this and not able to find out why this exception is coming.


Re: Exception while running cassandra stress client

2015-04-02 Thread Abhinav Ranjan
Hi,

We too got the same error. Use cassandra-stress shipped with cassandra
2.1.x to run the test like that.

Regards
Abhinav
On 02-Apr-2015 11:44 am, ankit tyagi ankittyagi.mn...@gmail.com wrote:

 Hi All,

 while running cassandra stress tool shipped with cassandra 2.0.4 version,
 i am getting following error

 *./bin/cassandra-stress user profile=./bin/test.yaml*
 *Application does not allow arbitrary arguments: user,
 profile=./bin/test.yaml*

 I am stuck on this and not able to find out why this exception is coming.




log all the query statement

2015-04-02 Thread 鄢来琼
Hi all,

Cassandra 2.1.2 is used in my project, but some node is down after executing 
query some statements.
Could I configure the Cassandra to log all the executed statement?
Hope the log file can be used to identify the problem.
Thanks.

Peter



Re: SSTable structure

2015-04-02 Thread Serega Sheypak
Thank you, great to know that.

2015-04-01 23:14 GMT+02:00 Bharatendra Boddu bharatend...@gmail.com:

 Hi Serega,

 Most of the content in the blog article is still relevant. After 1.2.5
 (ic), there are only three new versions (ja, jb, ka) for SSTable format.
 Following are the changes in these versions.

 // ja (2.0.0): super columns are serialized as composites (note that 
 there is no real format change,
 //   this is mostly a marker to know if we should expect 
 super columns or not. We do need
 //   a major version bump however, because we should not 
 allow streaming of super columns
 //   into this new format)
 // tracks max local deletiontime in sstable metadata
 // records bloom_filter_fp_chance in metadata component
 // remove data size and column count from data file 
 (CASSANDRA-4180)
 // tracks max/min column values (according to comparator)
 // jb (2.0.1): switch from crc32 to adler32 for compression checksums
 // checksum the compressed data
 // ka (2.1.0): new Statistics.db file format
 // index summaries can be downsampled and the sampling 
 level is persisted
 // switch uncompressed checksums to adler32
 // tracks presense of legacy (local and remote) counter 
 shards

 - bharat

 On Wed, Apr 1, 2015 at 12:02 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:

 Hi bharat,
 you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
 Were there any significant changes to SSTable format and layout?
 Thank you, article is interesting.

 Hi jacob jacob.rho...@me.com,
 HBase does it for example.
 http://hbase.apache.org/book.html#_hfile_format_2
 It would be great to give general ideas. It could help to understand
 schema design problems. You start to understand better how Cassandra scans
 data how you can utilize its power.

 2015-04-01 5:39 GMT+02:00 Bharatendra Boddu bharatend...@gmail.com:

 Some time back I created a blog article about the SSTable storage format
 with some code references.

 Cassandra: SSTable Storage Format
 http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html

 - bharat

 On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden jacob.rho...@me.com
 wrote:

 Yes updating code and documentation can sometimes be annoying, you
 would only ever maintain both if it were important. It comes down or is
 having the format of the data files documented for everyone to understand
 an important thing?

 __
 Sent from iPhone

 On 31 Mar 2015, at 11:07 am, daemeon reiydelle daeme...@gmail.com
 wrote:

 why? Then there are 2 places 2 maintain or get jira'ed for a
 discrepancy.
 On Mar 30, 2015 4:46 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Mar 30, 2015 at 1:38 AM, Pierre pierredev...@gmail.com
 wrote:

 Does anyone know if there is a more complete and up to date
 documentation about the sstable files structure (data, index, stats etc.)
 than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable


 No, there isn't. Unfortunately you will have to read the source.


 I'm looking for a full specification, with schema of the structure if
 possible.


 It would be nice if such fundamental things were documented, wouldn't
 it?

 =Rob








Re: COMMERCIAL:Re: Cross-datacenter requests taking a very long time.

2015-04-02 Thread Andrew Vant
On Mar 31, 2015, at 4:59 PM, daemeon reiydelle daeme...@gmail.com wrote:
 What is your replication factor?

NetworkTopologyStrategy with replfactor: 2 in each DC. 

Someone else asked about the endpoint snitch I'm using; it's set to 
GossipingPropertyFileSnitch.

 Any idea how much data has to be processed under the query?

It does not matter what query I use, or what size; the problem occurs even just 
selecting a single user from the users table.

 While running the query against both DC's, you can take a look at netstats
 to get a really quick-and-dirty idea of network traffic.

I'll try that. I should add that one of the other teams here has a similar 
setup (3 nodes in 3 DCs) that is working correctly. We're going to go through 
the config files and see if we can figure out what's different. 

-- 

Andrew

Re: COMMERCIAL:Re: Cross-datacenter requests taking a very long time.

2015-04-02 Thread daemeon reiydelle
You might want to see what quorum is configured? I meant to ask that.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Apr 2, 2015 at 12:39 PM, Andrew Vant andrew.v...@rackspace.com
wrote:

 On Mar 31, 2015, at 4:59 PM, daemeon reiydelle daeme...@gmail.com wrote:
  What is your replication factor?

 NetworkTopologyStrategy with replfactor: 2 in each DC.

 Someone else asked about the endpoint snitch I'm using; it's set to
 GossipingPropertyFileSnitch.

  Any idea how much data has to be processed under the query?

 It does not matter what query I use, or what size; the problem occurs even
 just selecting a single user from the users table.

  While running the query against both DC's, you can take a look at
 netstats
  to get a really quick-and-dirty idea of network traffic.

 I'll try that. I should add that one of the other teams here has a similar
 setup (3 nodes in 3 DCs) that is working correctly. We're going to go
 through the config files and see if we can figure out what's different.

 --

 Andrew


RE: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper

2015-04-02 Thread Tiwari, Tarun
Sorry I was unable to reply for couple of days.
I checked the error again and can’t see any other initial cause. Here is the 
full error that is coming.

Exception in thread main java.lang.NoClassDefFoundError: 
com/datastax/spark/connector/mapper/ColumnMapper
at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.datastax.spark.connector.mapper.ColumnMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)



From: Dave Brosius [mailto:dbros...@mebigfatguy.com]
Sent: Tuesday, March 31, 2015 8:46 PM
To: user@cassandra.apache.org
Subject: Re: Getting NoClassDefFoundError for 
com/datastax/spark/connector/mapper/ColumnMapper




Is there an 'initial cause' listed under that exception you gave? As 
NoClassDefFoundError is not exactly the same as ClassNotFoundException. It 
meant that ColumnMapper couldn't initialize it's static initializer, it could 
be because some other class couldn't be found, or it could be some other non 
classloader related error.



On 2015-03-31 10:42, Tiwari, Tarun wrote:
Hi Experts,

I am getting java.lang.NoClassDefFoundError: 
com/datastax/spark/connector/mapper/ColumnMapper while running a app to load 
data to Cassandra table using the datastax spark connector

Is there something else I need to import in the program or dependencies?

RUNTIME ERROR:  Exception in thread main java.lang.NoClassDefFoundError: 
com/datastax/spark/connector/mapper/ColumnMapper
at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Below is my scala program

/*** ld_Cassandra_Table.scala ***/
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector
import com.datastax.spark.connector._

object ldCassandraTable {
def main(args: Array[String]) {
  val fileName = args(0)
  val tblName = args(1)
  val conf = new SparkConf(true).set(spark.cassandra.connection.host, 
MASTER HOST) .setMaster(MASTER URL) .setAppName(LoadCassandraTableApp)
  val sc = new SparkContext(conf)
  
sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar)
  val normalfill = sc.textFile(fileName).map(line = line.split('|'))
  normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), 
line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), 
line(14), line(15), line(16), line(17), line(18), line(19), line(20), 
line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, 
timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, 
applydtm, laboracctid, paycodeid, startdtm, stimezoneid, 
adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, 
wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, 
totaledversion, acctapprovalnum))
  println(Records Loaded to .format(tblName))
  Thread.sleep(500)
  sc.stop()
}
}

Below is the sbt file:

name:= “POC”
version := 0.0.1

scalaVersion := 2.10.4

// additional libraries
libraryDependencies ++= Seq(
  org.apache.spark %% spark-core % 1.1.1 % provided,
  org.apache.spark %% spark-sql % 1.1.1 % provided,
  com.datastax.spark %% spark-cassandra-connector % 1.1.1 % provided
)

Regards,
Tarun Tiwari | Workforce Analytics-ETL | Kronos India
M: +91 9540 28 27 77 | Tel: +91 120 4015200
Kronos | Time  Attendance • Scheduling • Absence Management • HR  Payroll • 
Hiring • Labor Analytics
Join Kronos on: kronos.comhttp://www.kronos.com/ | 
Facebookhttp://www.kronos.com/facebook|Twitterhttp://www.kronos.com/twitter|LinkedInhttp://www.kronos.com/linkedin
 |YouTubehttp://www.kronos.com/youtube



does DC_LOCAL require manually truncating system.paxos on failover?

2015-04-02 Thread Sean Bridges
We are using lightweight transactions, two datacenters and DC_LOCAL
consistency level.

There is a comment in CASSANDRA-5797,

This would require manually truncating system.paxos when failing over.

Is that required?  I don't see it documented anywhere else.

Thanks,

Sean

https://issues.apache.org/jira/browse/CASSANDRA-5797


Re: Cluster status instability

2015-04-02 Thread Michal Michalski
Hey Marcin,

Are they actually going up and down repeatedly (flapping) or just down and
they never come back?
There might be different reasons for flapping nodes, but to list what I
have at the top of my head right now:

1. Network issues. I don't think it's your case, but you can read about the
issues some people are having when deploying C* on AWS EC2 (keyword to look
for: phi_convict_threshold)

2. Heavy load. Node is under heavy load because of massive number of reads
/ writes / bulkloads or e.g. unthrottled compaction etc., which may result
in extensive GC.

Could any of these be a problem in your case? I'd start from investigating
GC logs e.g. to see how long does the stop the world full GC take (GC
logs should be on by default from what I can see [1])

[1] https://issues.apache.org/jira/browse/CASSANDRA-5319

Michał


Kind regards,
Michał Michalski,
michal.michal...@boxever.com

On 2 April 2015 at 11:05, Marcin Pietraszek mpietras...@opera.com wrote:

 Hi!

 We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
 installed. Assume we have nodes A, B, C, D, E. On some irregular basis
 one of those nodes starts to report that subset of other nodes is in
 DN state although C* deamon on all nodes is running:

 A$ nodetool status
 UN B
 DN C
 DN D
 UN E

 B$ nodetool status
 UN A
 UN C
 UN D
 UN E

 C$ nodetool status
 DN A
 UN B
 UN D
 UN E

 After restart of A node, C and D report that A it's in UN and also A
 claims that whole cluster is in UN state. Right now I don't have any
 clear steps to reproduce that situation, do you guys have any idea
 what could be causing such behaviour? How this could be prevented?

 It seems like when A node is a coordinator and gets request for some
 data being replicated on C and D it respond with Unavailable
 exception, after restarting A that problem disapears.

 --
 mp



Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
Hi all -

We currently have a single cassandra cluster that is dedicated to a
relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
for another, unrelated, system, and my debate is whether to just add the
new tables to our existing cassandra cluster or whether to spin up an
entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching
talks and such online that the big users (e.g. Netflix, Spotify) tend to
favor multiple, single-purpose clusters, and thus that was my initial
preference.  But we are (for now) no where close to them in traffic so I'm
wondering if running an entirely separate cluster would be a premature
optimization which wouldn't pay for the (nontrivial) overhead in
configuration management and ops.  While we are still small it might be
much smarter to reuse our existing clusters so that I can get it done
faster...

Thanks!
- Ian


Cluster status instability

2015-04-02 Thread Marcin Pietraszek
Hi!

We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
installed. Assume we have nodes A, B, C, D, E. On some irregular basis
one of those nodes starts to report that subset of other nodes is in
DN state although C* deamon on all nodes is running:

A$ nodetool status
UN B
DN C
DN D
UN E

B$ nodetool status
UN A
UN C
UN D
UN E

C$ nodetool status
DN A
UN B
UN D
UN E

After restart of A node, C and D report that A it's in UN and also A
claims that whole cluster is in UN state. Right now I don't have any
clear steps to reproduce that situation, do you guys have any idea
what could be causing such behaviour? How this could be prevented?

It seems like when A node is a coordinator and gets request for some
data being replicated on C and D it respond with Unavailable
exception, after restarting A that problem disapears.

-- 
mp


Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper

2015-04-02 Thread Dave Brosius

This is what i meant by 'initial cause'

Caused by: java.lang.ClassNotFoundException: 
com.datastax.spark.connector.mapper.ColumnMapper


So it is in fact a classpath problem

Here is the class in question 
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/mapper/ColumnMapper.scala


Maybe it would be worthwhile to put this at the top of your main method

System.out.println(System.getProperty(java.class.path);

and show what that prints.

What version of the cassandra and what version of the cassandra-spark 
connector are you using, btw?






On 04/02/2015 11:16 PM, Tiwari, Tarun wrote:


Sorry I was unable to reply for couple of days.

I checked the error again and can’t see any other initial cause. Here 
is the full error that is coming.


Exception in thread main java.lang.NoClassDefFoundError: 
com/datastax/spark/connector/mapper/ColumnMapper


at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

*Caused by: java.lang.ClassNotFoundException: 
com.datastax.spark.connector.mapper.ColumnMapper*


at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

*From:*Dave Brosius [mailto:dbros...@mebigfatguy.com]
*Sent:* Tuesday, March 31, 2015 8:46 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Getting NoClassDefFoundError for 
com/datastax/spark/connector/mapper/ColumnMapper


Is there an 'initial cause' listed under that exception you gave? As 
NoClassDefFoundError is not exactly the same as 
ClassNotFoundException. It meant that ColumnMapper couldn't initialize 
it's static initializer, it could be because some other class couldn't 
be found, or it could be some other non classloader related error.


  


On 2015-03-31 10:42, Tiwari, Tarun wrote:

Hi Experts,

I am getting java.lang.NoClassDefFoundError:
com/datastax/spark/connector/mapper/ColumnMapper while running a
app to load data to Cassandra table using the datastax spark connector

Is there something else I need to import in the program or
dependencies?

*RUNTIME ERROR:*  Exception in thread main
java.lang.NoClassDefFoundError:
com/datastax/spark/connector/mapper/ColumnMapper

at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

*Below is my scala program*

/*** ld_Cassandra_Table.scala ***/

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

import com.datastax.spark.connector

import com.datastax.spark.connector._

object ldCassandraTable {

def main(args: Array[String]) {

val fileName = args(0)

val tblName = args(1)

val conf = new
SparkConf(true).set(spark.cassandra.connection.host, MASTER
HOST) .setMaster(MASTER URL)
.setAppName(LoadCassandraTableApp)

val sc = new SparkContext(conf)


sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar)

val normalfill = sc.textFile(fileName).map(line = line.split('|'))

normalfill.map(line = (line(0), line(1), line(2), line(3),
line(4), line(5), line(6), line(7), line(8), line(9), line(10),
line(11), line(12), line(13), line(14), line(15), line(16),
line(17), line(18), line(19), line(20),
line(21))).saveToCassandra(keyspace, tblName,
SomeColumns(wfctotalid, timesheetitemid, employeeid,
durationsecsqty, wageamt, moneyamt, applydtm,
laboracctid, paycodeid, startdtm, stimezoneid,
adjstartdtm, adjapplydtm, enddtm, homeaccountsw,
notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty,
updatedtm, totaledversion, acctapprovalnum))

println(Records Loaded to .format(tblName))

Thread.sleep(500)

sc.stop()

}

}

*Below is the sbt file:*

name:= “POC”

version := 0.0.1

scalaVersion := 2.10.4

// additional libraries

libraryDependencies ++= Seq(

org.apache.spark %% spark-core % 1.1.1 % provided,

org.apache.spark %% spark-sql % 1.1.1 % provided,

com.datastax.spark %% spark-cassandra-connector % 1.1.1 %

Cassandra - Storm

2015-04-02 Thread Vanessa Gligor
Hi all,

Did anybody use Cassandra for the tuple storage in Storm? I have this
scenario: I have a spout (getting messages from RabbitMQ) and I want to
save all these messages in Cassandra using a bolt. What is the best choice
regarding the connection to the DB? I have read about Hector API. I used
it, but for now I wasn't able to add a new row in a column family.

Any help would be appreciated.

Regards,
Vanessa.


Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Carlos Rolo
Adding a new keyspace should be perfectly fine. Unless you have completely
distinct workloads for the different keyspaces. Even so you can balanced
some stuff at keyspace/table level. But I would go with a new keyspace not
with a new cluster given the small size you say you have.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian



-- 


--





Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
Thanks for the input, folks!

As a startup, we don't really have different dev teams / apps - everything
is in service of the product, so given these responses, I think putting
both into the same cluster is the best idea.  And if we want to split them
out in the future we are still small enough that it would be a pain but not
the end of the world...

Cheers,
Ian


On Thu, Apr 2, 2015 at 9:57 AM, Carlos Rolo r...@pythian.com wrote:

 Adding a new keyspace should be perfectly fine. Unless you have completely
 distinct workloads for the different keyspaces. Even so you can balanced
 some stuff at keyspace/table level. But I would go with a new keyspace not
 with a new cluster given the small size you say you have.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian



 --






Re: Cluster status instability

2015-04-02 Thread Jan
Marcin  ; 
are all your nodes within the same Region   ?   If not in the same region,   
what is the Snitch type that you are using   ? 
Jan/ 


 On Thursday, April 2, 2015 3:28 AM, Michal Michalski 
michal.michal...@boxever.com wrote:
   

 Hey Marcin,
Are they actually going up and down repeatedly (flapping) or just down and they 
never come back?There might be different reasons for flapping nodes, but to 
list what I have at the top of my head right now:
1. Network issues. I don't think it's your case, but you can read about the 
issues some people are having when deploying C* on AWS EC2 (keyword to look 
for: phi_convict_threshold)
2. Heavy load. Node is under heavy load because of massive number of reads / 
writes / bulkloads or e.g. unthrottled compaction etc., which may result in 
extensive GC.
Could any of these be a problem in your case? I'd start from investigating GC 
logs e.g. to see how long does the stop the world full GC take (GC logs 
should be on by default from what I can see [1])
[1] https://issues.apache.org/jira/browse/CASSANDRA-5319
Michał

Kind regards,Michał Michalski,michal.michal...@boxever.com
On 2 April 2015 at 11:05, Marcin Pietraszek mpietras...@opera.com wrote:

Hi!

We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
installed. Assume we have nodes A, B, C, D, E. On some irregular basis
one of those nodes starts to report that subset of other nodes is in
DN state although C* deamon on all nodes is running:

A$ nodetool status
UN B
DN C
DN D
UN E

B$ nodetool status
UN A
UN C
UN D
UN E

C$ nodetool status
DN A
UN B
UN D
UN E

After restart of A node, C and D report that A it's in UN and also A
claims that whole cluster is in UN state. Right now I don't have any
clear steps to reproduce that situation, do you guys have any idea
what could be causing such behaviour? How this could be prevented?

It seems like when A node is a coordinator and gets request for some
data being replicated on C and D it respond with Unavailable
exception, after restarting A that problem disapears.

--
mp




  

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Jack Krupansky
There is an old saying in the software industry: The structure of a system
follows from the structure of the organization that created it (Conway's
Law). Seriously, the main, first question for your end is who owns the
applications in terms of executive management, such that if management
makes a decision that dramatically affects the app's impact on the cluster,
is it likely that they will have done so with the concurrence of management
who owns the other app. Trust me, you do not want to be in the middle when
two managers are in dispute over whose app is more important. IOW, if one
manager owns both apps, you are probably safe, but if two different
managers might have differing views of each other's priorities, tread with
caution.

In any case, be prepared to move one of the apps to a different cluster if
and when usage patterns cause them to conflict.

There is also the concept of devOps, where the app developers also own
operations. You really can't have two separate development teams administer
operations for one set of hardware.

If you are dedicated to operations for both app teams and the teams seem to
be reasonably compatible, then it could be fine.

In short, sure, technically a single cluster can support  any number of key
spaces, but mostly it will come down to whether there might be an excess of
contention for load and operations of the cluster in production.

And then little things like software upgrades - one app might really need a
disruptive or risky upgrade or need to bounce the entire cluster, but then
the other app may be impacted even though it had no need for the upgrade or
be bounced.

Are the apps synergistic in some way, such that there is an architectural
benefit from running on the same hardware?

In the end, the simplest solution is typically the better solution, unless
any of these other factors loom too large.


-- Jack Krupansky

On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian




Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread daemeon reiydelle
Jack did a superb job of explaining all of your issues, and his last
sentence seems to fit your needs (and my experience) very well. The only
other point I would add is to ascertain if the use patterns commend
microservices to abstract from data locality, even if the initial
deployment is a noop to a single cluster. This depends on whether you see a
rapid stream of special purpose business functions. A second question is
about data access ... does Pig support your data access response times?
Many clients find Hadoop ideally suited to a sophisticated ECTL (extract,
cleanup, transformation, and load) model to fast, schema oriented,
repositories like e.g. MySQL. All depends on the use case, growth 
fragmentation expectations for your business model(s), etc.

Good luck.

PS, Jack thanks, for your succint comment.




On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 There is an old saying in the software industry: The structure of a system
 follows from the structure of the organization that created it (Conway's
 Law). Seriously, the main, first question for your end is who owns the
 applications in terms of executive management, such that if management
 makes a decision that dramatically affects the app's impact on the cluster,
 is it likely that they will have done so with the concurrence of management
 who owns the other app. Trust me, you do not want to be in the middle when
 two managers are in dispute over whose app is more important. IOW, if one
 manager owns both apps, you are probably safe, but if two different
 managers might have differing views of each other's priorities, tread with
 caution.

 In any case, be prepared to move one of the apps to a different cluster if
 and when usage patterns cause them to conflict.

 There is also the concept of devOps, where the app developers also own
 operations. You really can't have two separate development teams administer
 operations for one set of hardware.

 If you are dedicated to operations for both app teams and the teams seem
 to be reasonably compatible, then it could be fine.

 In short, sure, technically a single cluster can support  any number of
 key spaces, but mostly it will come down to whether there might be an
 excess of contention for load and operations of the cluster in production.

 And then little things like software upgrades - one app might really need
 a disruptive or risky upgrade or need to bounce the entire cluster, but
 then the other app may be impacted even though it had no need for the
 upgrade or be bounced.

 Are the apps synergistic in some way, such that there is an architectural
 benefit from running on the same hardware?

 In the end, the simplest solution is typically the better solution, unless
 any of these other factors loom too large.


 -- Jack Krupansky

 On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian





Re: Cluster status instability

2015-04-02 Thread daemeon reiydelle
Do you happen to be using a tool like Nagios or Ganglia that are able to
report utilization (CPU, Load, disk io, network)? There are plugins for
both that will also notify you of (depending on whether you enabled the
intermediate GC logging) about what is happening.



On Thu, Apr 2, 2015 at 8:35 AM, Jan cne...@yahoo.com wrote:

 Marcin  ;

 are all your nodes within the same Region   ?
 If not in the same region,   what is the Snitch type that you are using
 ?

 Jan/



   On Thursday, April 2, 2015 3:28 AM, Michal Michalski 
 michal.michal...@boxever.com wrote:


 Hey Marcin,

 Are they actually going up and down repeatedly (flapping) or just down and
 they never come back?
 There might be different reasons for flapping nodes, but to list what I
 have at the top of my head right now:

 1. Network issues. I don't think it's your case, but you can read about
 the issues some people are having when deploying C* on AWS EC2 (keyword to
 look for: phi_convict_threshold)

 2. Heavy load. Node is under heavy load because of massive number of reads
 / writes / bulkloads or e.g. unthrottled compaction etc., which may result
 in extensive GC.

 Could any of these be a problem in your case? I'd start from investigating
 GC logs e.g. to see how long does the stop the world full GC take (GC
 logs should be on by default from what I can see [1])

 [1] https://issues.apache.org/jira/browse/CASSANDRA-5319

 Michał


 Kind regards,
 Michał Michalski,
 michal.michal...@boxever.com

 On 2 April 2015 at 11:05, Marcin Pietraszek mpietras...@opera.com wrote:

 Hi!

 We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
 installed. Assume we have nodes A, B, C, D, E. On some irregular basis
 one of those nodes starts to report that subset of other nodes is in
 DN state although C* deamon on all nodes is running:

 A$ nodetool status
 UN B
 DN C
 DN D
 UN E

 B$ nodetool status
 UN A
 UN C
 UN D
 UN E

 C$ nodetool status
 DN A
 UN B
 UN D
 UN E

 After restart of A node, C and D report that A it's in UN and also A
 claims that whole cluster is in UN state. Right now I don't have any
 clear steps to reproduce that situation, do you guys have any idea
 what could be causing such behaviour? How this could be prevented?

 It seems like when A node is a coordinator and gets request for some
 data being replicated on C and D it respond with Unavailable
 exception, after restarting A that problem disapears.

 --
 mp







Re: Frequent timeout issues

2015-04-02 Thread daemeon reiydelle
May not be relevant, but what is the default heap size you have deployed.
Should be no more than 16gb (and be aware of the impacts of gc on that
large size), suggest not smaller than 8-12gb.



On Wed, Apr 1, 2015 at 11:28 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Are you writing multiple cf at same time?
 Please run nodetool tpstats to make sure that FlushWriter etc doesnt have
 high All time blocked counts. A Blocked memtable FlushWriter may block/drop
 writes. If thats the case you may need to increase memtable flush
 writers..if u have many secondary indexes in cf ..make sure that memtable
 flush que size is set at least equal to no of indexes..

 monitoring iostat and gc logs may help..

 Thanks
 Anuj Wadehra
 --
   *From*:Amlan Roy amlan@cleartrip.com
 *Date*:Wed, 1 Apr, 2015 at 9:27 pm
 *Subject*:Re: Frequent timeout issues

 Did not see any exception in cassandra.log and system.log. Monitored using
 JConsole. Did not see anything wrong. Do I need to see any specific info?
 Doing almost 1000 writes/sec.

 HBase and Cassandra are running on different clusters. For cassandra I
 have 6 nodes with 64GB RAM(Heap is at default setting) and 32 cores.

 On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote:




Re: Column value not getting updated

2015-04-02 Thread daemeon reiydelle
Interesting that you are finding excessive drift from public time servers.
I only once saw that problem with AWS' time servers. To be conservative I
sometimes recommend that clients spool up their own time server, but
realize IT will also drift if the public time servers do! Somewhat
different if in your own DC, but same time server drift issues.

Google has resorted to putting tier one time server(s) (cesium clock or
whatever) in every data center due to the public drift issues. Does anyone
know if AWS' time server is now stratum 1 backed?

However, it is better to have two (at least) in AWS, make sure their
private IP's are not in the same 24 CIDR subnet!

Of course this can get troublesome if load sharing between e.g. AWS East
and West.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Mar 31, 2015 at 10:49 PM, Saurabh Sethi saurabh_se...@symantec.com
wrote:

 Thanks Mark. A great post indeed and saved me a lot of trouble.

 - Saurabh
 From: Mark Greene green...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 31, 2015 at 10:15 PM
 To: user@cassandra.apache.org user@cassandra.apache.org

 Subject: Re: Column value not getting updated

 Hey Saurabh,

 We're actually preparing for this ourselves and spinning up our own NTP
 server pool. The public NTP pools have a lot of drift and should not be
 relied upon for cluster technology that is sensitive to time skew like C*.

 The folks at Logentries did a great write up about this which we used as a
 guide.



-

 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
-

 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/


 -Mark

 On Tue, Mar 31, 2015 at 5:59 PM, Saurabh Sethi saurabh_se...@symantec.com
  wrote:

 That’s what I found out that the clocks were not in sync.

 But I have setup NTP on all 3 nodes and would expect the clocks to be in
 sync.

 From: Nate McCall n...@thelastpickle.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 31, 2015 at 2:50 PM
 To: Cassandra Users user@cassandra.apache.org
 Subject: Re: Column value not getting updated

 You would see that if the servers' clocks were out of sync.

 Make sure the time on the servers is in sync or set the client timestamps
 explicitly.

 On Tue, Mar 31, 2015 at 3:23 PM, Saurabh Sethi 
 saurabh_se...@symantec.com wrote:

 I have written a unit test that creates a column family, inserts a row
 in that column family and then updates the value of one of the columns.

 After updating, unit test immediately tries to read the updated value
 for that column, but Cassandra returns the old value.

- I am using QueryBuilder API and not CQL directly.
- I am using the consistency level of QUORUM for everything –
insert, update and read.
- Cassandra is running as a 3 node cluster with replication factor
of 3.


 Anyone has any idea what is going on here?

 Thanks,
 Saurabh




 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com





Re: Frequent timeout issues

2015-04-02 Thread Jonathan Haddad
@Daemeon you may want to read through
https://issues.apache.org/jira/browse/CASSANDRA-8150, there are perfectly
valid cases for heap  16gb.

On Thu, Apr 2, 2015 at 10:07 AM daemeon reiydelle daeme...@gmail.com
wrote:

 May not be relevant, but what is the default heap size you have
 deployed. Should be no more than 16gb (and be aware of the impacts of gc on
 that large size), suggest not smaller than 8-12gb.



 On Wed, Apr 1, 2015 at 11:28 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Are you writing multiple cf at same time?
 Please run nodetool tpstats to make sure that FlushWriter etc doesnt have
 high All time blocked counts. A Blocked memtable FlushWriter may block/drop
 writes. If thats the case you may need to increase memtable flush
 writers..if u have many secondary indexes in cf ..make sure that memtable
 flush que size is set at least equal to no of indexes..

 monitoring iostat and gc logs may help..

 Thanks
 Anuj Wadehra
 --
   *From*:Amlan Roy amlan@cleartrip.com
 *Date*:Wed, 1 Apr, 2015 at 9:27 pm
 *Subject*:Re: Frequent timeout issues

 Did not see any exception in cassandra.log and system.log. Monitored
 using JConsole. Did not see anything wrong. Do I need to see any specific
 info? Doing almost 1000 writes/sec.

 HBase and Cassandra are running on different clusters. For cassandra I
 have 6 nodes with 64GB RAM(Heap is at default setting) and 32 cores.

 On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote:





Re: Frequent timeout issues

2015-04-02 Thread daemeon reiydelle
To the poster, I am sorry to have taken this off topic. Looking forward to
your reply regarding your default heap size, frequency of hard garbage
collection, etc. In any case I am not convinced that heap size/garbage
collection is a root cause of your issue, but it has been so frequently a
problem that I tend to ask that question early on.

Jon, thank you for pointing that out to those who are 100% convinced large
heaps are an anti-pattern that this is not necessarily an anti-pattern ...
I am well aware of that interesting thread, and find it provides a clear
guidance that in most cases, large heaps are an anti-pattern ... except in
fairly rare use cases, only after extensive analysis, and several
iterations of tuning. FYI, I have (both in Hadoop and Cassandra) created
specialized clusters with carefully monitored row sizes and schemas to
leverage the read-mostly options of large heaps.

My experiences may be a corner case, as I tend to work with clusters that
have been up for a while, and sort of grew sideways from the original
expecations.

The analysis is clear that, under certain specific conditions, with
extensive tuning, it just might be possible to run with very large heaps.
But thanks for pointing this out as there is a LOT of information included
there that can help us to deal with certain corner cases where it IS
possible to productively run larger heaps, and the implied anti-patterns.

To the poster, I am sorry to have taken this off topic. Looking forward to
your reply regarding your default heap size, frequency of hard garbage
collection, etc.





On Thu, Apr 2, 2015 at 10:16 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 @Daemeon you may want to read through
 https://issues.apache.org/jira/browse/CASSANDRA-8150, there are perfectly
 valid cases for heap  16gb.

 On Thu, Apr 2, 2015 at 10:07 AM daemeon reiydelle daeme...@gmail.com
 wrote:

 May not be relevant, but what is the default heap size you have
 deployed. Should be no more than 16gb (and be aware of the impacts of gc on
 that large size), suggest not smaller than 8-12gb.



 On Wed, Apr 1, 2015 at 11:28 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Are you writing multiple cf at same time?
 Please run nodetool tpstats to make sure that FlushWriter etc doesnt
 have high All time blocked counts. A Blocked memtable FlushWriter may
 block/drop writes. If thats the case you may need to increase memtable
 flush writers..if u have many secondary indexes in cf ..make sure that
 memtable flush que size is set at least equal to no of indexes..

 monitoring iostat and gc logs may help..

 Thanks
 Anuj Wadehra
 --
   *From*:Amlan Roy amlan@cleartrip.com
 *Date*:Wed, 1 Apr, 2015 at 9:27 pm
 *Subject*:Re: Frequent timeout issues

 Did not see any exception in cassandra.log and system.log. Monitored
 using JConsole. Did not see anything wrong. Do I need to see any specific
 info? Doing almost 1000 writes/sec.

 HBase and Cassandra are running on different clusters. For cassandra I
 have 6 nodes with 64GB RAM(Heap is at default setting) and 32 cores.

 On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote: