Re: Cassandra Multi DC (Active-Active) Setup - Measuring latency & throughput performance

2016-02-27 Thread chandrasekar.krc
Thanks Bryan for the inputs. One of the tests that I'm trying to do is fire 
write requests in one DC and simultaneously do read requests from other DC 
using cassandra-stress (custom schema option). While reading from the other DC, 
I'm expecting cassandra-stress to throw some error for no data found scenario 
atleast for the first few milli-seconds until the cross data center replication 
is successful.


Likewise, for writes across both DC, does Cassandra have any function similar 
to Oracle SYSDATE/SYSTIMESTAMP which can be used to measure the time difference 
of records across data centers.


Thanks & Regards, Chandra Sekar KR

From: Bryan Cheng 
Sent: Saturday, February 27, 2016 05:01
To: user@cassandra.apache.org
Subject: Re: Cassandra Multi DC (Active-Active) Setup - Measuring latency & 
throughput performance

Hi Chandra,

For write latency, etc. the tools are still largely the same set of tools you'd 
use for single-DC- stuff like tracing, cfhistograms, cassandra-stress come to 
mind. The exact results are going to differ based on your consistency tuning 
(can you get away with LOCAL_QUORUM vs QUORUM?) and read/write patterns.

What other data are you looking to gather?

On Fri, Feb 26, 2016 at 5:53 AM, 
> wrote:

Hi,


Are there any links/resources which describe performance measurement (latency & 
throughput) for a Cassandra Multi DC Active-Active setup across a WAN network 
(20Gbps bandwidth) with 5 nodes in each DC.


Basically, I would like to know how to measure latency of writes when data is 
replicated across DC (local/remote) in active-active cluster setup


Regards, Chandra KR

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Cassandra Multi DC (Active-Active) Setup - Measuring latency & throughput performance

2016-02-26 Thread chandrasekar.krc
Hi,


Are there any links/resources which describe performance measurement (latency & 
throughput) for a Cassandra Multi DC Active-Active setup across a WAN network 
(20Gbps bandwidth) with 5 nodes in each DC.


Basically, I would like to know how to measure latency of writes when data is 
replicated across DC (local/remote) in active-active cluster setup


Regards, Chandra KR

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Re: Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-19 Thread chandrasekar.krc
Please find below the graph plotted out of cassandra-stress test output log. 
While the columnar data took 36 mins to insert 20m records, the JSON format 
data was loaded in under 10 mins. The tests were carried on bare-metal 4 node 
cluster with 16-core CPU and 120GB memory (8GB Heap) backed by SSDs.

[cid:a99c2964-32d9-4a1b-81fe-9ada3ce80c07]
Regards, Chandra Sekar KR

From: daemeon reiydelle 
Sent: Friday, February 19, 2016 12:57
To: user@cassandra.apache.org
Subject: Re: Compatability, performance & portability of Cassandra data types 
(MAP, UDT & JSON) in DSE Search & Analytics

Given you only have 16 columns vs. over 200 ... I would expect a substantial 
improvement in writes, but not 5x.
Ditto reads. I would be interested to understand where that 5x comes from.


...

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Thu, Feb 18, 2016 at 8:20 PM, Chandra Sekar KR 
> wrote:

Hi,


I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) 
data types in Cassandra & its ease of use/impact across other DSE products - 
Spark & Solr. We are migrating an OLTP database from RDBMS to Cassandra which 
has 200+ columns and with an average daily volume of 25 million records/day. 
The access pattern is quite simple and in OLTP the access is always based on 
primary key. For OLAP, there are other access patterns with a combination of 
columns where we are planning to use Spark & Solr for search & analytical 
capabilities (in a separate DC).


The average size of each record is ~2KB and the application workload is of type 
INSERT only (no updates/deletes). We conducted performance tests on two types 
of data models

1) A table with 200+ columns similar to RDBMS

2) A table with 15 columns where only critical business fields are maintained 
as key/value pairs and the remaining are stored in a single column of type TEXT 
as JSON object.


In the results, we noticed significant advantage in the JSON model where the 
performance was 5X times better than columnar data model. Alternatively, we are 
in the process of evaluating performance for other data types - MAP & UDT 
instead of using TEXT for storing JSON object. Sample data model structure for 
columnar, json, map & udt types are given below:


[cid:9136e044-677b-4e0a-8bb2-5305acc2782d]


I would like to know the performance, transformation, compatibility & 
portability impacts & east-of-use of each of these data types from Search & 
Analytics perspective (Spark & Solr). I'm aware that we will have to use field 
transformers in Solr to use index on JSON fields, not sure about MAP & UDT. Any 
help on comparison of these data types in Spark & Solr is highly appreciated.


Regards, KR

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Cassandra Data Model with Narrow partition

2015-10-30 Thread chandrasekar.krc
Hi,

Could you please suggest if Narrow partition is a  good choice for the below 
use case.


1)  Write heavy event log table with 50m inserts per day with a peak load 
of 20K transaction per sec. There aren't any updates/deletes to records 
inserted. Records are inserted with a TTL of 60 days (retention period)

2)  The table has a single primary key which is a sequence number (27 
digits) generated by source application

3)  There are only two access patterns used - one by using the sequence 
number & the other using sequence number + event date (range scans also 
possible)

4)  My target data model in Cassandra is partitioned with sequence number 
as the primary key + event date as clustering columns to enable range scans on 
date.

5)  The Table has close to 120+ columns and the average row size comes 
close to 32K bytes

6)  Reads are very very less and account to <5% while inserts can be close 
to 95%.

7)  From a functional standpoint, I do not see any other columns that can 
be part of primary key to keep the partition reasonable (<100MB)

Questions:

1)  Is Narrow partition an ideal choice for the above use case.

2)  Is artificial bucketing an alternate choice to make the partition 
reasonable

3)  We are using varint as the data type for sequence number which is 27 
digits long. Is DECIMAL data type ?

4)  Any suggestions on performance impacts during compaction ?

Regards, Chandra Sekar KR

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


RE: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread chandrasekar.krc
Hi Doan,

Is the timeBased() method available in Java driver similar to now() function in 
cqlsh. Does both provide identical results.

Also, the preference is to generate values during record insertion from 
database side, rather than client side. Something similar to SYSTIMESTAMP in 
Oracle.

Regards, Chandra Sekar KR
From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: 29/10/2015 5:13 PM
To: user@cassandra.apache.org
Subject: Re: Oracle TIMESTAMP(9) equivalent in Cassandra

You can use TimeUUID data type and provide the value yourself from client side.

The Java driver offers an utility class com.datastax.driver.core.utils.UUIDs 
and the method timeBased() to generate the TimeUUID.

 The precision is only guaranteed up to 100 nano seconds. So you can have 
possibly 10k distincts values for 1 millsec. For your requirement of 20k per 
sec, it should be enough.

On Thu, Oct 29, 2015 at 12:10 PM, 
> wrote:
Hi,

Oracle Timestamp data type supports fractional seconds (upto 9 digits, 6 is 
default). What is the Cassandra equivalent data type for Oracle TimeStamp 
nanosecond precision.

This is required for determining the order of insertion of record where the 
number of records inserted per sec is close to 20K. Is TIMEUUID an alternate 
functionality which can determine the order of record insertion in Cassandra ?

Regards, Chandra Sekar KR
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread chandrasekar.krc
Hi,

Oracle Timestamp data type supports fractional seconds (upto 9 digits, 6 is 
default). What is the Cassandra equivalent data type for Oracle TimeStamp 
nanosecond precision.

This is required for determining the order of insertion of record where the 
number of records inserted per sec is close to 20K. Is TIMEUUID an alternate 
functionality which can determine the order of record insertion in Cassandra ?

Regards, Chandra Sekar KR
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com