RE: dblink between oracle and cassandra
What are you trying to achieve? From: Rahul Bhardwajmailto:rahul.bhard...@indiamart.com Sent: 1/31/2015 2:05 AM To: usermailto:user@cassandra.apache.org Subject: dblink between oracle and cassandra Hi All, I want make a dblink from oracle 11g and cassandra cluster. Is there any way or any alternative to do the same. please help. Regards: Rahul Bhardwaj -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=hmS4Afl2bNU!!!
RE: Cassandra on Ceph
What is the reason for running Cassandra on Ceph? I have both running in my environment but doing different things - Cassandra as transactional store and Ceph as block storage for storing files. From: Janmailto:cne...@yahoo.com Sent: 2/1/2015 2:53 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Cassandra on Ceph Colin; Ceph is a block based storage architecture based on RADOS.It comes with its own replication rebalancing along with a map of the storage layer. Some more details similarities: a)Ceph stores a client’s data as objects within storage pools. (think of C* partitions)b) Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, (C* primary keys vnode data distribution) c) and further calculates which Ceph OSD Daemon should store the placement group (C* node locality) d) The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically (C* big table storage architecture). Summary: C* comes with everything that Ceph provides (with the exception of block storage). There is no value add that Ceph brings to the table that C* does not already provide. I seriously doubt if C* could even work out of the box with yet another level of replication rebalancing. Hope this helpsJan/ C* Architect On Saturday, January 31, 2015 7:28 PM, Colin Taylor colin.tay...@gmail.com wrote: I may be forced to run Cassandra on top of Ceph. Does anyone have experience / tips with this. Or alternatively, strong reasons why this won't work. cheersColin
RE: Cassandra 2.0.11 with stargate-core read writes are slow
Is there any specific reason for using Cassandra for search instead of using something like elastic search Kibana? From: Janmailto:cne...@yahoo.com Sent: 1/31/2015 1:08 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Cassandra 2.0.11 with stargate-core read writes are slow HI Asit; Question 1) Am I using the right hardware as of now I am testing say 10 record reads. Answer: Recommend looking at either the 'sar' output logs watching nodetool cfstats watching your system.log files to track hardware usage JVM presssure.As a rule of thumb, its recommeneded to have 8 GB for the C* JVM itself on production systems. Question 3) Is unclear, pl. rephrase the question. hope this helpsJan C* Architect On Saturday, January 31, 2015 5:33 AM, Carlos Rolo r...@pythian.com wrote: HI Asit, The only help I'm going to give is on point 3), as I have little experience with 2) and 1) depends on a lot of factors. For testing the workload use this: http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.html It probably covers all your testing needs. Regards, Carlos Juzarte RoloCassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarteroloTel: 1649www.pythian.com On Sat, Jan 31, 2015 at 2:49 AM, Asit KAUSHIK asitkaushikno...@gmail.com wrote: Hi all, We are testing our logging application on 3 node cluster each system is virtual machine with 4 cores and 8GB RAM with RedHat enterprise. Now my question is in 3 parts 1) Am I using the right hardware as of now I am testing say 10 record reads. 2) I am using Stargate-core for full text search is there any slowness observed because of that as ??? 2) How can I simulate the write load I created an application which creates say 20 threads and each tread I insert 1000 records and on each thread I open cluster connection session connection execute 1000 records and close the connection. This takes a lot of time please suggest if I missing something --
Re: Read repair
Yes, it helps. Thanks --- Original Message --- From: Aaron Morton aa...@thelastpickle.com Sent: October 31, 2013 3:51 AM To: Cassandra User user@cassandra.apache.org Subject: Re: Read repair (assuming RF 3 and NTS is putting a replica in each rack) Rack1 goes down and some writes happen in quorum against rack 2 and 3. During this period (1) writes will be committed onto a node in both rack 2 and 3. Hints will be stored on a node in either rack 2 or 3. After couple of hours rack1 comes back and rack2 goes down. During this period writes from period (1) will be guaranteed to be on rack 3. Reads at QUORUM must use a node from rack 1 and rack 3. As such the read will include the node in rack 3 that stored the write during period (1). Now for rows inserted for about 1 hour and 30 mins, there is no quorum until failed rack comes back up. In your example there is always a QUORUM as we always had 2 or the 3 racks and so 2 of the 3 replicas for each row. For the CL guarantee to work we just have to have one of the nodes that completed the write be involved in the read. Hope that helps. - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 30/10/2013, at 12:32 am, Baskar Duraikannu baskar.duraika...@outlook.com wrote: Aaron Rack1 goes down and some writes happen in quorum against rack 2 and 3. Hinted handoff is set to 30 mins. After couple of hours rack1 comes back and rack2 goes down. Hinted handoff will play but will not cover all of the writes because of 30 min setting. Now for rows inserted for about 1 hour and 30 mins, there is no quorum until failed rack comes back up. Hope this explains the scenario. From: Aaron Morton Sent: 10/28/2013 2:42 AM To: Cassandra User Subject: Re: Read repair As soon as it came back up, due to some human error, rack1 goes down. Now for some rows it is possible that Quorum cannot be established. Not sure I follow here. if the first rack has come up I assume all nodes are available, if you then lose a different rack I assume you have 2/3 of the nodes available and would be able to achieve a QUORUM. Just to minimize the issues, we are thinking of running read repair manually every night. If you are reading and writing at QUORUM and the cluster does not have a QUORUM of nodes available writes will not be processed. During reads any mismatch between the data returned from the nodes will be detected and resolved before returning to the client. Read Repair is an automatic process that reads from more nodes than necessary and resolves the differences in the back ground. I would run nodetool repair / Anti Entropy as normal, once on every machine every gc_grace_seconds. If you have a while rack fail for run repair on the nodes in the rack if you want to get it back to consistency quickly. The need to do that depends on the config for Hinted Handoff, read_repair_chance, Consistency level, the write load, and (to some degree) the number of nodes. If you want to be extra safe just run it. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 26/10/2013, at 2:54 pm, Baskar Duraikannu baskar.duraika...@outlook.com wrote: We are thinking through the deployment architecture for our Cassandra cluster. Let us say that we choose to deploy data across three racks. If let us say that one rack power went down for 10 mins and then it came back. As soon as it came back up, due to some human error, rack1 goes down. Now for some rows it is possible that Quorum cannot be established. Just to minimize the issues, we are thinking of running read repair manually every night. Is this a good idea? How often do you perform read repair on your cluster?
RE: Read repair
Aaron Rack1 goes down and some writes happen in quorum against rack 2 and 3. Hinted handoff is set to 30 mins. After couple of hours rack1 comes back and rack2 goes down. Hinted handoff will play but will not cover all of the writes because of 30 min setting. Now for rows inserted for about 1 hour and 30 mins, there is no quorum until failed rack comes back up. Hope this explains the scenario. From: Aaron Mortonmailto:aa...@thelastpickle.com Sent: 10/28/2013 2:42 AM To: Cassandra Usermailto:user@cassandra.apache.org Subject: Re: Read repair As soon as it came back up, due to some human error, rack1 goes down. Now for some rows it is possible that Quorum cannot be established. Not sure I follow here. if the first rack has come up I assume all nodes are available, if you then lose a different rack I assume you have 2/3 of the nodes available and would be able to achieve a QUORUM. Just to minimize the issues, we are thinking of running read repair manually every night. If you are reading and writing at QUORUM and the cluster does not have a QUORUM of nodes available writes will not be processed. During reads any mismatch between the data returned from the nodes will be detected and resolved before returning to the client. Read Repair is an automatic process that reads from more nodes than necessary and resolves the differences in the back ground. I would run nodetool repair / Anti Entropy as normal, once on every machine every gc_grace_seconds. If you have a while rack fail for run repair on the nodes in the rack if you want to get it back to consistency quickly. The need to do that depends on the config for Hinted Handoff, read_repair_chance, Consistency level, the write load, and (to some degree) the number of nodes. If you want to be extra safe just run it. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 26/10/2013, at 2:54 pm, Baskar Duraikannu baskar.duraika...@outlook.com wrote: We are thinking through the deployment architecture for our Cassandra cluster. Let us say that we choose to deploy data across three racks. If let us say that one rack power went down for 10 mins and then it came back. As soon as it came back up, due to some human error, rack1 goes down. Now for some rows it is possible that Quorum cannot be established. Just to minimize the issues, we are thinking of running read repair manually every night. Is this a good idea? How often do you perform read repair on your cluster?
Read repair
We are thinking through the deployment architecture for our Cassandra cluster. Let us say that we choose to deploy data across three racks. If let us say that one rack power went down for 10 mins and then it came back. As soon as it came back up, due to some human error, rack1 goes down. Now for some rows it is possible that Quorum cannot be established. Just to minimize the issues, we are thinking of running read repair manually every night. Is this a good idea? How often do you perform read repair on your cluster?
manual read repair
We have seen read repair take very long time even for few GBs of data even though we don't see disk or network bottlenecks. Do you use any specific configuration to speed up read repairs?
RE: Facebook Cassandra
Thanks. Date: Sun, 6 Oct 2013 11:48:42 -0400 Subject: Re: Facebook Cassandra From: edlinuxg...@gmail.com To: user@cassandra.apache.org As it relates to c* http://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDF We have built, implemented, and operated a storage sys-tem providing scalability, high performance, and wide appli- cability. We have empirically demonstrated that Cassandracan support a very high update throughput while deliver- ing low latency. Future works involves adding compression,ability to support atomicity across keys and secondary index support. Why doesn't facebook use the cloud? Does that mean you should not use the cloud? Why does facebook use scribe and not flume? Why does facebook use thrift and not avro or protobuf? What I am trying to get at is technical decisions are very environment specific. From other talks I have seen facebook breaks there hbase into cells of 100 nodes. http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase See slide 23 No one else I know claims to do this. Again should you do this? On Fri, Oct 4, 2013 at 3:27 PM, Blair Zajac bl...@orcaware.com wrote: On 10/04/2013 11:46 AM, Baskar Duraikannu wrote: Good evening. We are using Cassandra for a while. I have been faced with a question why did facebook drop Cassandra over and over again. I could not find a good answer to this question on the internet. Could you please help me with the question? Quora seems to be a good place to ask these questions: http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform http://www.quora.com/search?q=cassandra+facebook Blair
Facebook Cassandra
Good evening. We are using Cassandra for a while. I have been faced with a question why did facebook drop Cassandra over and over again. I could not find a good answer to this question on the internet. Could you please help me with the question? --RegardsBaskar Duraikannu
RE: Update-Replace
I have a similar use case but only need to update portion of the row. We basically perform single write (with old and new columns) with very low value of ttl for old columns. From: jan.algermis...@nordsc.com Subject: Update-Replace Date: Fri, 30 Aug 2013 17:35:48 +0200 To: user@cassandra.apache.org Hi, I have a use case, where I periodically need to apply updates to a wide row that should replace the whole row. The straight-forward insert/update only replace values that are present in the executed statement, keeping remaining data around. Is there a smooth way to do a replace with C* or do I have to handle this by the application (e.g. doing delete and then write or coming up with a more clever data model)? Jan
RE: Listblob retrieve performance
I don't know of any. I would check the size of LIST. If it is taking long, it could be just that disk read is taking long. Date: Sat, 31 Aug 2013 16:35:22 -0300 Subject: Listblob retrieve performance From: savio.te...@lupa.inf.ufg.br To: user@cassandra.apache.org I have a column family with this conf: CREATE TABLE geoms ( geom_key text PRIMARY KEY, part_geom listblob, the_geom text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I run this query select geom_key, the_geom, part_geom from geoms limit 1; in 700ms. When I run the same query without part_geom attr (select geom_key, the_geom from geoms limit 1;), the query runs in 5 ms. Is there a performance problem with a Listblob attribute? Thanks in advance -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Quorum reads and response time
I have a 3 node cluster with RF=3. All nodes are running. I have a table with 39 rows and ~44,000 columns evenly spread across 39 rows. When I do range slice query on this table with consistency of one, it returns the data back in about ~600 ms. I tried the same from all of the 3 nodes,no matter which node I ran it from, queries were answered in 600 ms for consistency level of one. But when I run the same query with consistency level as Quorum, it is taking ~2.3 seconds. It feels as if querying of the nodes are in sequence. Is this normal? -- Regards, Baskar Duraikannu
Re: Quorum reads and response time
Just adding few other details to my question. - We are using RandomPartitioner - 256 virtual nodes configured. On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: I have a 3 node cluster with RF=3. All nodes are running. I have a table with 39 rows and ~44,000 columns evenly spread across 39 rows. When I do range slice query on this table with consistency of one, it returns the data back in about ~600 ms. I tried the same from all of the 3 nodes,no matter which node I ran it from, queries were answered in 600 ms for consistency level of one. But when I run the same query with consistency level as Quorum, it is taking ~2.3 seconds. It feels as if querying of the nodes are in sequence. Is this normal? -- Regards, Baskar Duraikannu
Re: Node tokens / data move
I copied the sstables and then ran a repair. It worked. Looks like export and import may have been much faster given that we had very little data. Thanks everyone. On Tue, Jul 9, 2013 at 1:34 PM, sankalp kohli kohlisank...@gmail.comwrote: Hi Aaron, Can he not specify all 256 tokens in the YAML of the new cluster and then copy sstables? I know it is a bit ugly but should work. Sankalp On Tue, Jul 9, 2013 at 3:19 AM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: Thanks Aaron On 7/9/13, aaron morton aa...@thelastpickle.com wrote: Can I just copy data files for the required keyspaces, create schema manually and run repair? If you have something like RF 3 and 3 nodes then yes, you can copy the data from one node in the source cluster to all nodes in the dest cluster and use cleanup to remove the unneeded data. Because each node in the source cluster has a full copy of the data. If that's not the case you cannot copy the data files, even if they have the same number of nodes, because the nodes in the dest cluster will have different tokens. AFAIK you need to export the full data set from the source DC and then import it into the dest system. The Bulk Load utility may be of help http://www.datastax.com/docs/1.2/references/bulkloader . You could copy the SSTables from every node in the source system and bulk load them into the dest system. That process will ensure rows are sent to nodes that are replicas. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 12:45 PM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: We have two clusters used by two different groups with vnodes enabled. Now there is a need to move some of the keyspaces from cluster 1 to cluster 2. Can I just copy data files for the required keyspaces, create schema manually and run repair? Anything else required? Please help. -- Thanks, Baskar Duraikannu
Re: Working with libcql
You can replace USE statement with create statement and then change use_callback with whatever you want to do next. -- Thanks, Baskar Duraikannu Shubham Mittal smsmitta...@gmail.com wrote: So, if I want to create a keyspace, what do I need to change in that file? On Thu, Jul 11, 2013 at 5:04 AM, aaron morton aa...@thelastpickle.com wrote: The highlighted line will read all the rows from the system table that lists the keyspaces in the cluster. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 9:46 PM, Shubham Mittal smsmitta...@gmail.com wrote: yeah I tried that and below is the output I get LOG: resolving remote host localhost:9160 LOG: resolved remote host, attempting to connect LOG: connection successful to remote host LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, stream: 0x00, opcode: 0x05, length: 0} OPTIONS LOG: wrote to socket 8 bytes LOG: error reading header End of file and I checked all the keyspaces in my cluster, it changes nothing in the cluster. I couldn't understand the code much. What is this code supposed to do anyways? On Tue, Jul 9, 2013 at 4:20 AM, aaron morton aa...@thelastpickle.com wrote: Did you see the demo app ? Seems to have a few examples of reading data. https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85 Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 1:14 AM, Shubham Mittal smsmitta...@gmail.com wrote: Hi, I found out that there exist a C++ client libcql for cassandra but its github repository just provides the example on how to connect to cassandra. Is there anyone who has written some code using libcql to read and write data to a cassandra DB, kindly share it. Thanks
Re: Node tokens / data move
Thanks Aaron On 7/9/13, aaron morton aa...@thelastpickle.com wrote: Can I just copy data files for the required keyspaces, create schema manually and run repair? If you have something like RF 3 and 3 nodes then yes, you can copy the data from one node in the source cluster to all nodes in the dest cluster and use cleanup to remove the unneeded data. Because each node in the source cluster has a full copy of the data. If that's not the case you cannot copy the data files, even if they have the same number of nodes, because the nodes in the dest cluster will have different tokens. AFAIK you need to export the full data set from the source DC and then import it into the dest system. The Bulk Load utility may be of help http://www.datastax.com/docs/1.2/references/bulkloader . You could copy the SSTables from every node in the source system and bulk load them into the dest system. That process will ensure rows are sent to nodes that are replicas. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 12:45 PM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: We have two clusters used by two different groups with vnodes enabled. Now there is a need to move some of the keyspaces from cluster 1 to cluster 2. Can I just copy data files for the required keyspaces, create schema manually and run repair? Anything else required? Please help. -- Thanks, Baskar Duraikannu
Node tokens / data move
We have two clusters used by two different groups with vnodes enabled. Now there is a need to move some of the keyspaces from cluster 1 to cluster 2. Can I just copy data files for the required keyspaces, create schema manually and run repair? Anything else required? Please help. -- Thanks, Baskar Duraikannu
3 data centers
Good morning. We are thinking of setting up 3 data centers with NetworkTopologyStrategy with each DC having 1 copy. All three data centers are going to be connected by dark fiber with 10 ms latency. Due to network latency, all QUORUM reads and writes will be slower. Has anyone used this kind of setup before? Have you seen any issues other than network latency? Thanks Baskar
Thrift CPU Usage
Hello - I have been running read tests on Cassandra using stress tool. I have been noticing that thrift seems to be taking lot of CPU over 70% when I look at the CPU samples report. Is this normal? CPU usage seems to go down by 5 to 10% when I change the RPC from sync to async. Is this normal? I am running Cassandra 0.8.4 on Cent OS 5.6 ( Kernel 2.6.18.238) and Oracle JVM. - Thanks Baskar Duraikannu
Re: Thrift CPU Usage
Aaron From the CPU samples report. Here is the parts of the CPU samples report (-Xrunhprof:cpu=samples, depth=4). TRACE 300668: java.net.SocketInputStream.socketRead0(SocketInputStream.java:Unknown line) java.net.SocketInputStream.read(SocketInputStream.java:129) org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) TRACE 300310: java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line) java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) java.net.ServerSocket.implAccept(ServerSocket.java:462) java.net.ServerSocket.accept(ServerSocket.java:430) TRACE 300639: sun.nio.ch.ServerSocketChannelImpl.accept0(ServerSocketChannelImpl.java:Unknown line) sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:617) TRACE 300670: java.net.SocketOutputStream.socketWrite0(SocketOutputStream.java:Unknown line) java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) java.net.SocketOutputStream.write(SocketOutputStream.java:136) org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) TRACE 301122: java.lang.Object.notify(Object.java:Unknown line) org.apache.cassandra.utils.SimpleCondition.signal(SimpleCondition.java:62) org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:169) org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java rank self accum count trace method 1 74.00% 74.00% 160934 300668 java.net.SocketInputStream.socketRead0 2 14.85% 88.85% 32302 300310 java.net.PlainSocketImpl.socketAccept 3 3.67% 92.52%7990 300639 sun.nio.ch.ServerSocketChannelImpl.accept0 4 1.90% 94.43%4142 300670 java.net.SocketOutputStream.socketWrite0 5 0.79% 95.22%1716 301122 java.lang.Object.notify -- Thanks Baskar Duraikannu On Sep 26, 2011, at 4:55 PM, aaron morton wrote: How are you deciding what is thrift ? Thrift is used to handle connections and serialize / de-serialize off the wire. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 27/09/2011, at 2:32 AM, Baskar Duraikannu wrote: Hello - I have been running read tests on Cassandra using stress tool. I have been noticing that thrift seems to be taking lot of CPU over 70% when I look at the CPU samples report. Is this normal? CPU usage seems to go down by 5 to 10% when I change the RPC from sync to async. Is this normal? I am running Cassandra 0.8.4 on Cent OS 5.6 ( Kernel 2.6.18.238) and Oracle JVM. - Thanks Baskar Duraikannu
Choice of Index
Hello - I am using 0.8 Beta 2 and have a CF containing COMPANY, ACCOUNTNUMBER and some account related data. I have index on both Company and AccountNumber. If I run a query - SELECT DATA FROM COMPANYCF WHERE COMPANY='XXX' AND ACCOUNTNUMBER = 'YYY' Even though ACCOUNTNUMBER based Index is a better index to use for the above query compared to COMPANY based Index, Cassandra seems to pick COMPANY index. Does Cassandra always uses Index on first where clause? I can always change the above query to have account number as the first where clause. Just wanted to understand whether any kind of index optimization built into Cassandra 0.8. Thanks Baskar Duraikannu
Re: Performance tests using stress testing tool
Thanks Peter. I believe...I found the root cause. Switch that we used was bad. Now on a 4 node cluster ( Each Node has 1 CPU - Quad Core and 16 GB of RAM), I was able to get around 11,000 writes and 10,050 reads per second simultaneously (CPU usage is around 45% on all nodes. Disk queue size is in the neighbourhood of 10) Is this inline with what you usually see with Cassandra? - Original Message - From: Peter Schuller To: user@cassandra.apache.org Sent: Friday, April 29, 2011 12:21 PM Subject: Re: Performance tests using stress testing tool Thanks Peter. I am using java version of the stress testing tool from the contrib folder. Is there any issue that should be aware of? Do you recommend using pystress? I just saw Brandon file this: https://issues.apache.org/jira/browse/CASSANDRA-2578 Maybe that's it. -- / Peter Schuller
Re: Performance tests using stress testing tool
Thanks Peter. I am using java version of the stress testing tool from the contrib folder. Is there any issue that should be aware of? Do you recommend using pystress? I will rerun tests in order to monitor ethernet stats closely and will update. -- Thanks, Baskar Duraikannu - Original Message - From: Peter Schuller To: user@cassandra.apache.org Sent: Thursday, April 28, 2011 1:21 PM Subject: Re: Performance tests using stress testing tool When I looked at the benchmark client machine, it was not under any stress in terms of disk or CPU. Are you running with the python multiprocessor module available? stress should print a warning if it's not. If it's not, you'd end up with a threaded mode and due to Python's GIL you'd be bottlenecking on CPU without actually using more than ~ 1 core on the machine. But test machines are connected through 10/100 mbps switch port (not gigabit). Can this be a bottleneck? Maybe but seems not-so-likely unless you've turned up the column size. Check if 'ifstat 1' if you're pushing lots of data. Another possibility is that the writes are periodically blocking due to compaction (and compaction is not yet parallel unless you're running 0.8 betas). However this should show up in the stress client by a period lack of progress; it shouldn't give you a smooth amount of cpu usage over time. -- / Peter Schuller
Re: Performance tests using stress testing tool
Thanks Peter. When I looked at the benchmark client machine, it was not under any stress in terms of disk or CPU. But test machines are connected through 10/100 mbps switch port (not gigabit). Can this be a bottleneck? Thanks Baskar - Original Message - From: Peter Schuller To: user@cassandra.apache.org Sent: Thursday, April 28, 2011 2:34 AM Subject: Re: Performance tests using stress testing tool a) I am not seeing cpu usage more than 10pct. Sounds like the benchmarking client is bottlenecking. In some of the forums, i see that 8 cpu 32 gb is considered as good sweet spot for cassandra. Is this true? Seems reasonable in a very general sense, but of course varies with use-case. Also when would I see real cpu spikes. At this moment looks like 4 core is more than sufficient. In general, the more requests and columns you read and write, the more you'll be bottlenecking on CPU. The larger individual columns (and thus fewer columns), the more you'll be bound on disk instead. In your case the bottleneck seems to be the benchmark I think. B) iostat -x is reporting avgqueue size of around 0.25 and await time of around 30 ms. What would be acceptable queue size and await time? Any avgqueuesize significantly below 1 is generally good. For close to 1 or higher than one, it will depend on your access pattern, latency demands, and the nature of your storage device (e.g., SSD:s, RAID:s can sustain concurrent I/O). To simplify, there is some maximum amount of I/O requests that your storage device will service concurrently. For a normal disk, this is 1 request (I'm ignoring optimizations due to TCQ/NCQ which can be significant sometimes). As long as you're below the saturation point it's mostly about statistics and varying I/O patterns causing latency. The less saturated you are, the better your average latency will be. Once you're *above* saturation latency goes haywire as you don't service as many I/O requests as are coming in. There is a grey area in between where latency will be very sensitive to smallish changes in I/O load but aggregate throughput remaining below what can be sustained. -- / Peter Schuller
Multi-DC Deployment
We are planning to deploy Cassandra on two data centers. Let us say that we went with three replicas with 2 being in one data center and last replica in 2nd Data center. What will happen to Quorum Reads and Writes when DC1 goes down (2 of 3 replicas are unreachable)? Will they timeout? Regards, Baskar
Re: Help on decommission
No. I stopped the stress test before issuing decommission command. So, it was not under ANY load. I waited for over an hour and nothing changed. Then , I turned on DEBUG in the log4j-server.properties and then restarted the Cassandra process . As soon as I restarted, the decommissioned node left the cluster and everything was back to normal. Have you seen this behaviour before? From: Jonathan Colby Sent: Tuesday, April 12, 2011 3:15 PM To: user@cassandra.apache.org Subject: Re: Help on decommission how long as it been in Leaving status? Is the cluster under stress test load while you are doing the decommission? On Apr 12, 2011, at 6:53 PM, Baskar Duraikannu wrote: I have setup a 4 node cluster for testing. When I setup the cluster, I have setup initial tokens in such a way that each gets 25% of load and then started the node with autobootstrap=false. After all nodes are up, I loaded data using the stress test tool with replication factor of 3. As per of my testing, I am trying to remove one of the node using nodetool decomission but the node seems to be stuck in leaving status. How do I check whether it is doing any work at all? Please help [root@localhost bin]# ./nodetool -h 10.140.22.25 ring Address Status State LoadOwnsToken 127605887595351923798765477786913079296 10.140.22.66Up Leaving 119.41 MB 25.00% 0 10.140.22.42Up Normal 116.23 MB 25.00% 42535295865117307932921825928971026432 10.140.22.28Up Normal 119.93 MB 25.00% 85070591730234615865843651857942052864 10.140.22.25Up Normal 116.21 MB 25.00% 127605887595351923798765477786913079296 [root@localhost bin]# ./nodetool -h 10.140.22.66 netstats Mode: Leaving: streaming data to other nodes Streaming to: /10.140.22.42 /var/lib/cassandra/data/Keyspace1/Standard1-f-1-Data.db/(0,120929157) progress=120929157/120929157 - 100% /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 17 Responses n/a 0 108109 [root@usnynyc1cass02 bin]# ./nodetool -h 10.140.22.42 netstats Mode: Normal Not sending any streams. Streaming from: /10.140.22.66 Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Pool NameActive Pending Completed Commandsn/a 0 11 Responses n/a 0 107879 Regards, Baskar