RE: [Cassandra] Initial Setup - VMs for Research

2013-09-25 Thread Kanwar Sangha
What help are u looking for ? http://www.datastax.com/docs/datastax_enterprise3.1/install/install_deb_pkg -Original Message- From: shath...@e-z.net [mailto:shath...@e-z.net] Sent: 25 September 2013 15:27 To: user@cassandra.apache.org Subject: [Cassandra] Initial Setup - VMs for

nodetool tpstats

2013-09-18 Thread Kanwar Sangha
Hi - During a write heavy load, the tpstats show the following - Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 0 MUTATION 65570 _TRACE 0

RE: Secondary Index Question

2013-08-21 Thread Kanwar Sangha
) Later, Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, August 20, 2013 6:57 PM To: user@cassandra.apache.orgmailto:user

Secondary Index Question

2013-08-20 Thread Kanwar Sangha
Hi - I was reading some blogs on implementation of secondary indexes in Cassandra and they say that the read requests are sent sequentially to all the nodes ? So if I have a query to fetch ALL records with the secondary index filter, will the co-ordinator node send the requests to nodes one by

RE: Cassandra HANGS after some writes

2013-08-13 Thread Kanwar Sangha
Cassandra on windows ? Please install Linux ! From: Romain HARDOUIN [mailto:romain.hardo...@urssaf.fr] Sent: 13 August 2013 10:17 To: user@cassandra.apache.org Subject: Re: Cassandra HANGS after some writes Naresh, My two cents is that you should run Cassandra on a Linux VM. Issues are more

Cassandra Counter Family

2013-08-01 Thread Kanwar Sangha
Hi - We are struggling to understand how the counter family maintains consistency in Cassandra. Say Counter1 value is 1 and it is read by 2 clients at the same time who want to update the value. After both write, it will become 3 ?

RE: maximum storage per node

2013-07-25 Thread Kanwar Sangha
Issues with large data nodes would be - * Nodetool repair will be impossible to run * Your read i/o will suffer since you will almost always go to disk (each read will take 3 IOPS worst case) * Boot-straping the node in case of failures will take days/weeks From:

CPU Bound Writes

2013-07-19 Thread Kanwar Sangha
Insert-heavy workloads will actually be CPU-bound in Cassandra before being memory-bound Can someone explain why the internals of why writes are CPU bound ?

MailBox Impl

2013-07-18 Thread Kanwar Sangha
Hi - We are planning on using Cassandra for an IMAP based implementation. There are some questions that we are stuck with - 1) Each user will have a pre-defined mailbox size (say 10 MB). We need to maintain a field to check if the mail-box size exceeds the predefined size. Will using

RE: is there a key to sstable index file?

2013-07-17 Thread Kanwar Sangha
Yes..Multiple SSTables can have same key and only after compaction the keys are merged reflect the latest value.. From: S Ahmed [mailto:sahmed1...@gmail.com] Sent: 17 July 2013 15:54 To: cassandra-u...@incubator.apache.org Subject: is there a key to sstable index file? Since SSTables are

block size

2013-06-20 Thread Kanwar Sangha
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?

RE: block size

2013-06-20 Thread Kanwar Sangha
Subject: Re: block size Have you seen this? http://www.datastax.com/dev/blog/cassandra-file-system-design Regards, Shahab On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?

slice query

2013-05-30 Thread Kanwar Sangha
Hi - We gave a dynamic CF which has a key and multiple columns which get added dynamically. For example - Key_1 , Column1, Column2, Column3,... Key_2 , Column1, Column2, Column3,. Now I want to get all columns after Column3...how do we query that ? The ColumnSliceIterator in hector

RE: Replica info

2013-05-09 Thread Kanwar Sangha
info http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints This tells you where a key lives. (you need to hex encode the key) On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: nodetool describering {keyspace} From: Kanwar

Replica info

2013-05-08 Thread Kanwar Sangha
Is there a way in Cassandra that we can know which node has the replica for the data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes have the same data ? Thanks, Kanwar

RE: HintedHandoff

2013-05-08 Thread Kanwar Sangha
Is this correct guys ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 07 May 2013 14:07 To: user@cassandra.apache.org Subject: HintedHandoff Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes

HintedHandoff

2013-05-07 Thread Kanwar Sangha
Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2 DCs) Now we do a write with CL = ONE and Hinted Handoff enabled. *If node 'X ' in DC1 which is a 'replica' node is down and a write

backup strategy

2013-05-07 Thread Kanwar Sangha
Hi - If we have a RF=2 in a 4 node cluster, how do we ensure that the backup taken is only for 1 copy of the data ? in other words, is it possible for us to take back-up only from 2 nodes and not all 4 and still have at least 1 copy of the data ? Thanks, Kanwar

RE: local_quorum

2013-05-05 Thread Kanwar Sangha
Anyone ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 03 May 2013 08:59 To: user@cassandra.apache.org Subject: local_quorum Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL for reads. Say there is a RF factor = 2. (so 2 copies each in DC). If both nodes

local_quorum

2013-05-03 Thread Kanwar Sangha
Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL for reads. Say there is a RF factor = 2. (so 2 copies each in DC). If both nodes which own the data in DC1 are down and I do a read with CL as local_quorum , will I get an error back to the application ? or will

RE: Networking

2013-04-24 Thread Kanwar Sangha
: 192.168.1.1 Or perhaps this machine has a second NIC with ip 10.140.179.1 and so you split the traffic for the intra-cluster network traffic from the thrift traffic for better performance: rpc_address: 10.140.179.1 From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 24 April 2013 10:11 To: user

RE: Networking

2013-04-24 Thread Kanwar Sangha
I mean across 2 Data centres. -Original Message- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 24 April 2013 14:56 To: user@cassandra.apache.org Subject: Re: Networking On Wed, Apr 24, 2013 at 8:11 AM, Kanwar Sangha kan...@mavenir.com wrote: What about a geo-link ? Can

Re: index filter

2013-04-19 Thread Kanwar Sangha
Let me rephrase. I am talking about the index file on disk created per sstable. Does that contain all key indexes? Sent from Samsung mobile Robert Coli rc...@eventbrite.com wrote: On Fri, Apr 19, 2013 at 10:38 AM, Kanwar Sangha kan...@mavenir.com wrote: Guys – Quick question. The index

Client lib

2013-04-18 Thread Kanwar Sangha
Hi - We are planning to develop a custom client using the Thrift API for Cassandra. Are these available from the JMX ? - Can cassandra provide info abt node status? - DC Failover detection (data center down, vs some nodes are down) - How to get load info from each node? Thanks, Kanwar

RE: How to make compaction run faster?

2013-04-18 Thread Kanwar Sangha
Use the community edition and try it out. Compaction has nothing to do with the CPU. It's all on raw disk speed. What kind of disks do you have ? 7.2k, 10k, 15k RPM ? Are your keys unique or you are doing updates ? if unique writes, I would not worry about compaction too much and let it run

Timeseries data

2013-03-27 Thread Kanwar Sangha
Hi - I have a query on Read with Cassandra. We are planning to have dynamic column family and each column would be on based a timeseries. Inserting data - key = ‘xxx′, {column_name = TimeUUID(now), :column_value = ‘value’ }, {column_name = TimeUUID(now), :column_value = ‘value’ },..

cfhistograms

2013-03-25 Thread Kanwar Sangha
Can someone explain how to read the cfhistograms o/p ? [root@db4 ~]# nodetool cfhistograms usertable data usertable/data histograms Offset SSTables Write Latency Read Latency Row Size Column Count 12857444 4051 0

Hinted Handoff

2013-03-25 Thread Kanwar Sangha
Hi - Quick question. Do hints contain the actual data or the data is read from the SStables and then sent to the other node when it comes up ? Thanks, Kanwar

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
Are your Keys spread across all SSTables ? That will cause every sstable read which will increase the I/O. What compaction are you using ? From: zod...@fifth-aeon.net [mailto:zod...@fifth-aeon.net] On Behalf Of Jon Scarborough Sent: 21 March 2013 23:00 To: user@cassandra.apache.org Subject:

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
nodes as well though these posts causes read to check authorization and such of our system. Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
that which was nice. We only have 300 data point posts / second so not an extreme write load on 6 nodes as well though these posts causes read to check authorization and such of our system. Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com mailto: kan...@mavenir.commailto:kan

chunk lenght

2013-03-09 Thread Kanwar Sangha
Hi - Can someone help explain this parameter ? chunk_length_kb If we increase it from default 64k to 128k does it mean that the sstable will be compressed in blocks of 128k ? Does that mean if we are reading and writing data of 128k , it will give a better read/write performance ? Thanks,

leveled compaction

2013-03-08 Thread Kanwar Sangha
Hi - Can someone explain the meaning for the levelled compaction in cfstats - SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0] SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0 Thanks, Kanwar

RE: leveled compaction

2013-03-08 Thread Kanwar Sangha
] So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth. '40/4' and '442/10' have numbers after slash, those are expected maximum number of SSTables in that level and only displayed when you have more than that threshold. On Friday, March 8, 2013 at 3:24 PM, Kanwar Sangha wrote: Hi

RE: Hinted handoff

2013-03-07 Thread Kanwar Sangha
- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 1:22 PM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Is this correct ? I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM

VNodes and nodetool repair

2013-03-07 Thread Kanwar Sangha
Hi Guys - I have a question on Vnodes and nodetool repair. If I have configured the nodes as vnodes, say for example 2 nodes with Rf=2. Questions - *There are some columns set with TTL as X. After X Cassandra will mark them as tombstones. Is there still a probability of running into

Hinted handoff

2013-03-06 Thread Kanwar Sangha
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing around 8Mb/s (bits). Thanks, Kanwar

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Got the param. thanks From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March 2013 13:50 To: user@cassandra.apache.org Subject: Hinted handoff Hi - Is there a way to increase the hinted handoff throughput ? I am seeing around 8Mb/s (bits). Thanks, Kanwar

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
After trying to bump up the hinted_handoff_throttle_in_kb to 1G/b per sec, It still does not go above 25Mb/s. Is there a limitation ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March 2013 14:41 To: user@cassandra.apache.org Subject: RE: Hinted handoff Got the param. thanks

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Is this correct ? I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 80 per disk. Data is ~9.5 TB So 4K * 80 * 9.5 = 3040 KB ~ 23.75 Mb/s. So basically I am limited at the disk rather than the n/w From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March

Storage question

2013-03-04 Thread Kanwar Sangha
Hi - Can someone suggest the optimal way to store files / images ? We are planning to use cassandra for meta-data for these files. HDFS is not good for small file size .. can we look at something else ? Thanks, Kanwar

RE: Storage question

2013-03-04 Thread Kanwar Sangha
could check that out. Out of curiosity, why is HDFS not good for a small file size? For reading, it should be the bomb with RF=3 since you can read from multiple nodes and such. Writes might be a little slower but still shouldn't be too bad. Later, Dean From: Kanwar Sangha kan

Replication Question

2013-03-04 Thread Kanwar Sangha
Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes per Data centre. DC1: 2, DC2:2 I do a write with consistency level - local_quorum which ensures that there is no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL = One. Will it return

RE: Replication Question

2013-03-04 Thread Kanwar Sangha
for Reads also ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 04 March 2013 14:54 To: user@cassandra.apache.org Subject: Replication Question Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes per Data centre. DC1: 2, DC2:2 I do a write with consistency level

NetworkTopology

2013-02-28 Thread Kanwar Sangha
Hi - Quick question. When specifying the replication across 2 DCs, can we have 1 replication factor across 2 Data centres ? Does the below mean that there will be 2 copies of the data , 1 in DC1 and 1 in DC2 ? [default@unknown] CREATE KEYSPACE test WITH placement_strategy =

RE: Read Perf

2013-02-26 Thread Kanwar Sangha
on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user

Read Perf

2013-02-25 Thread Kanwar Sangha
Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to

key cache size

2013-02-21 Thread Kanwar Sangha
Hi - What is the approximate overhead of the key cache ? Say each key is 50 bytes. What would be the overhead for this key in the key cache ? Thanks, Kanwar

RE: Read IO

2013-02-21 Thread Kanwar Sangha
Ok.. Cassandra default block size is 256k ? Now say my data in the column is 4 MB. And the disk is giving me 4k block size random reads @ 100 IOPS. I can read max 400k in one seek ? does that mean I would need multiple seeks to get the complete data ? -Original Message- From:

RE: SSTable Num

2013-02-21 Thread Kanwar Sangha
Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 3:47 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background

RE: cassandra vs. mongodb quick question(good additional info)

2013-02-21 Thread Kanwar Sangha
“The limiting factors are the time it take to repair, the time it takes to replace a node, the memory considerations for 100's of millions of rows. If you the performance of those operations is acceptable to you, then go crazy” If I have a node which is attached to a RAID and the node crashes

Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
Hi - Is it a good idea to use Cassandra with SAN ? Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding machines won't help ? Thanks Kanwar

RE: Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
to have a large expensive SAN. Don't be tempted by the shiny expensive SAN. :) If money is no object instead throw SSD's in your nodes and run 10G between racks From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

SSTable Num

2013-02-20 Thread Kanwar Sangha
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background. Is there a limit on the size per sstable ? Or will the sstable compaction continue and eventually we will have 1 file ? Thanks, Kanwar

File Store

2013-02-20 Thread Kanwar Sangha
Hi - I am looking for some inputs on the file storage in Cassandra. Each file size can range from 200kb - 3MB. I don't see any limitation on the column size. But would it be a good idea to store these files as binary in the columns ? Thanks, Kanwar

Read IO

2013-02-20 Thread Kanwar Sangha
Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512. 1) Bloom filter will be checked to see existence of key (In RAM) 2) Index filer sample (IN RAM) will be checked to find approx. location in index file on disk 3) 1 IOPS

RE: Mutation dropped

2013-02-18 Thread Kanwar Sangha
in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi

Cassandra backup

2013-02-18 Thread Kanwar Sangha
Hi - We have a req to store around 90 days of data per user. Last 7 days of data is going to be accessed frequently. Is there a way we can have the recent data (7 days) in SSD and the rest of the data in the HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster to serve

RE: Cassandra backup

2013-02-18 Thread Kanwar Sangha
@cassandra.apache.org Subject: Re: Cassandra backup There is this: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement But you'll need to design your data model around the fact that this is only as granular as 1 column family Best, michael From: Kanwar Sangha

Mutation dropped

2013-02-14 Thread Kanwar Sangha
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied

RE: Mutation dropped

2013-02-14 Thread Kanwar Sangha
dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09

Cassandra becnhmark

2013-02-11 Thread Kanwar Sangha
Hi - I am trying to do benchmark using the Cassandra-stress tool. They have given an example to insert data across 2 nodes - /tools/stress/bin/stress -d 192.168.1.101,192.168.1.102 -n 1000 But when I run this across my 2 node cluster, I see the same keys in both nodes. Replication is not

RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind

RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 1:47 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: 1) Version is 1.2 2) DynamicComposites : I read somewhere that they are not recommended ? 3) Good point. I need to think about

DataModel Question

2013-02-05 Thread Kanwar Sangha
Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key :

BloomFilter

2013-02-02 Thread Kanwar Sangha
Hi - Couple of questions - 1) What is the ratio of the sstable file size to bloom filter size ? If i have a sstable of 1 GB, what is the approximate bloom filter size ? Assuming 0.000744 default val configured. 2) The bloom filters are stored in RAM but not in help from 1.2 onwards ? 3)

Index file

2013-02-02 Thread Kanwar Sangha
Hi - The index files created for the SSTables. Do they contain a sampling or the complete index ? Cassandra on startup loads these files based on the sampling rate in Cassandra.yaml ..right ?