Cassandra counter column family performance
Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter After inserting a few million keys... 55 mil, the performance goes down the drain, 2-3 nodes in the cluster are on medium load, and when inserting batches of same lengths writes take longer and longer until the whole cluster becomes loaded and I get a lot of TExceptions... and the cluster becomes unresponsive. Did anyone have the same problem? Feel free to comment and share experiences about counter CF performance.
Re: Schema disagreement errors
Hi Gaurav, a schema versioning bug was fixed in 2.0.7. Best wishes, Duncan. On 12/05/14 21:31, Gaurav Sehgal wrote: We have recently started seeing a lot of Schema Disagreement errors. We are using Cassandra 2.0.6 with Oracle Java 1.7. I went through the Cassandra FAQ and followed the below steps: * nodetool disablethrift * nodetool disablegossip * nodetool drain * 'kill pid'. As per the documentation; the commit logs should have been flush; but that did not happen in our case. The commit logs were still there. So, I removed them manually to make sure there are no commit logs when cassandra start up( which was fine in our case as this data can always be replayed). I also deleted the schema* directory from the /data/system folder. Though when we started cassandra back up the issue started happening again. Any help would be appreciated Cheers! Gaurav
Re: Cassandra MapReduce/Storm/ etc
Hi, check out these following links: 1) http://frommyworkshop.blogspot.ru/search/label/Cassandra 2) http://frommyworkshop.blogspot.ru/2012/07/single-node-hadoop-cassandra-pig-setup.html -- Best regards Shamim A. 11.05.2014, 22:17, Manoj Khangaonkar khangaon...@gmail.com: Hi, Searching for Cassandra with MapReduce, I am finding that the search results are really dated -- from version 0.7 2010/2011. Is there a good blog/article that describes how using MapReduce on Cassandra table ? From my naive understanding, Cassandra is all about partitioning. Querying is based on partitionkey + clustered column(s). Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of tuples. If a database table is input source for MapReduce or Storm, for me , this is in the simple case, is translating to a full table scan of the input table, which can timeout and is generally not a recommended access pattern in Cassandra. My initial reaction is that if I need to process data with MapReduce or Storm, reading it from Cassandra might not be the optimal way. Storing the output to Cassandra however does make sense. If anyone had links to blogs or personal experience in this area, I would appreciate if you can share it. regards
Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device
Well, I finally resolved this issue by modifying cassandra to ignore sstables that had size bigger than a threshold. The leveled compaction will fall back to sized tiered compaction in some situation and that's why I always got some old huge sstables compacted. More details can be found in 'LeveledManifest.java' , the 'getCompactionCandidates' function. I modified the 'mostInterestingBucket' method of 'SizeTieredCompactionStrategy.java' and added a filter before function return: IteratorSSTableReader iter = hottest.left.iterator(); while (iter.hasNext()) { SSTableReader mysstable = iter.next(); if (mysstable.onDiskLength() 1099511627776L) { logger.info(Removed candidate {} , mysstable.toString()); iter.remove(); } } I don't have much time to do some more research to figure out if this has side effect or not, but this is a solution for me. I hope this would be useful to those who had similar issues. On Sun, May 4, 2014 at 5:10 PM, Yatong Zhang bluefl...@gmail.com wrote: I am using the latest 2.0.7. The 'nodetool tpstats' shows as: [root@storage5 bin]# ./nodetool tpstats Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 628220 0 0 RequestResponseStage 0 03342234 0 0 MutationStage 0 03172116 0 0 ReadRepairStage 0 0 47666 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 756024 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 6652 0 0 MemtablePostFlusher 0 0 7042 0 0 FlushWriter 0 0 4023 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator0 0 27 0 0 commitlog_archiver0 0 0 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 28 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 PAGED_RANGE 0 BINARY 0 READ 0 MUTATION 0 _TRACE 0 REQUEST_RESPONSE 0 COUNTER_MUTATION 0 And here is another type of error, and these errors seem to occur after 'disk is full' ERROR [SSTableBatchOpen:2] 2014-04-30 13:47:48,348 CassandraDaemon.java (line 198) Exception in thread Thread[SSTableBatchOpen:2,5,main] org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:110) at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64) at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:458) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:422) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:184) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:264) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) at java.io.DataInputStream.readUTF(DataInputStream.java:589) at java.io.DataInputStream.readUTF(DataInputStream.java:564) at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:85) ... 12 more On Sun, May 4, 2014 at 4:59 PM, DuyHai Doan doanduy...@gmail.com wrote: The symptoms looks like there are pending compactions stacking up or failed compactions so temporary files (-tmp-Data.db) are not
Schema errors when bootstrapping / restarting node
Hi All, I'm having some major issues bootstrapping a new node to my cluster. We are running 1.2.16, with vnodes enabled. When a new node starts up (with auto_bootstrap), it selects a host ID and finds the ring successfully: INFO 18:42:29,559 JOINING: waiting for ring information It successfully selects a set of tokens. Then the weird stuff begins. I get this error once, while the node is reading the system keyspace: ERROR 18:42:32,921 Exception in thread Thread[InternalResponseStage:1,5,main] java.lang.NullPointerException at org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:421) at org.apache.cassandra.cql.jdbc.JdbcLong.compose(JdbcLong.java:94) at org.apache.cassandra.db.marshal.LongType.compose(LongType.java:34) at org.apache.cassandra .cql3.UntypedResultSet$Row.getLong(UntypedResultSet.java:138) at org.apache.cassandra.db.SystemTable.migrateKeyAlias(SystemTable.java:199) at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:346) at org.apache.cassandra .service.MigrationTask$1.response(MigrationTask.java:66) at org.apache.cassandra .net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47) at org.apache.cassandra .net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) But it doesn't stop the bootstrap process. The node successfully handshakes versions, and pauses before bootstrapping: INFO 18:42:59,564 JOINING: schema complete, ready to bootstrap INFO 18:42:59,565 JOINING: waiting for pending range calculation INFO 18:42:59,565 JOINING: calculation complete, ready to bootstrap INFO 18:42:59,565 JOINING: getting bootstrap token INFO 18:42:59,705 JOINING: sleeping 3 ms for pending range setup After 30 seconds, I get a flood of endless org.apache.cassandra.db.UnknownColumnFamilyException errors, and all other nodes in the cluster log the following endlessly: INFO [HANDSHAKE-/x.x.x.x] 2014-05-09 18:44:36,289 OutboundTcpConnection.java (line 418) Handshaking version with /x.x.x.x I suspect there may be something wrong with my schemas. Sometimes while restarting an existing node, the node will fail to restart, with the following error, again while reading the system keyspace: ERROR [InternalResponseStage:5] 2014-05-05 23:56:03,786 CassandraDaemon.java (line 191) Exception in thread Thread[InternalResponseStage:5,5,main] org.apache.cassandra.db.marshal.MarshalException: cannot parse 'column1' as hex bytes at org.apache.cassandra .db.marshal.BytesType.fromString(BytesType.java:69) at org.apache.cassandra .config.ColumnDefinition.fromSchema(ColumnDefinition.java:231) at org.apache.cassandra .config.CFMetaData.addColumnDefinitionSchema(CFMetaData.java:1524) at org.apache.cassandra .config.CFMetaData.fromSchema(CFMetaData.java:1456) at org.apache.cassandra .config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:306) at org.apache.cassandra .db.DefsTable.mergeColumnFamilies(DefsTable.java:444) at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:356) at org.apache.cassandra .service.MigrationTask$1.response(MigrationTask.java:66) at org.apache.cassandra .net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:47) at org.apache.cassandra .net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NumberFormatException: An hex string representing bytes must have an even length at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52) at org.apache.cassandra .db.marshal.BytesType.fromString(BytesType.java:65) ... 12 more I am able to fix this error by clearing out the schema_columns system table on disk. After that, a node can boot successfully. Does anyone have a clue what's going on here? Thanks!
Re: Can Cassandra client programs use hostnames instead of IPs?
You can set listen_address in cassandra.yaml to a hostname (http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html). Cassandra will use the IP address returned by a DNS query for that hostname. On AWS you don't have to assign an elastic IP, all instances will come with a public IP that lasts its lifetime (if you use ec2-classic or your VPC is set up to assign them). Note that whatever hostname you set in a nodes listen_address, it will need to return the private IP as AWS instances only have network access via there private address. Traffic to a instances public IP is NATed and forwarded to the private address. So you may as well just use the nodes IP address. If you run hadoop on instances in the same AWS region it will be able to access your Cassandra cluster via private IP. If you run hadoop externally just use the public IPs. If you run in a VPC without public addressing and want to connect from external hosts you will want to look at a VPN (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote: Hi, Cassandra returns ips of the nodes in the cassandra cluster for further communication between hadoop program and the casandra cluster. Is there a way to configure the cassandra cluster to return hostnames instead of ips? My cassandra cluster is on AWS and has no elastic ips which can be accessed outside AWS. Thanks, Huiliang
Re: Disable reads during node rebuild
I'm not able to replace a dead node using the ordinary procedure (boostrap+join), and would like to rebuild the replacement node from another DC. Normally when you want to add a new DC to the cluster the command to use is nodetool rebuild $DC_NAME .(with auto_bootstrap: false) That will get the node to stream data from the $DC_NAME The problem is that if I start a node with auto_bootstrap=false to perform the rebuild, it automatically starts serving empty reads (CL=LOCAL_ONE). When adding a new DC the nodes wont be processing reads, that is not the case for you. You should disable the client API’s to prevent the clients from calling the new nodes, use -Dcassandra.start_rpc=false and -Dcassandra.start_native_transport=false in cassandra-env.sh or appropriate settings in cassandra.yaml Disabling reads from other nodes will be harder. IIRC during bootstrap a different timeout (based on ring_delay) is used to detect if the bootstrapping node is down. However if the node is running and you use nodetool rebuild i’m pretty sure the normal gossip failure detectors will kick in. Which means you cannot disable gossip to prevent reads. Also we would want the node to be up for writes. But what you can do is artificially set the severity of the node high so the dynamic snitch will route around it. See https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/DynamicEndpointSnitchMBean.java#L37 * Set the value to something high on the node you will be rebuilding, the number or cores on the system should do. (jmxterm is handy for this http://wiki.cyclopsgroup.org/jmxterm) * Check nodetool gossipinfo on the other nodes to see the SEVERITY app state has propagated. * Watch completed ReadStage tasks on the node you want to rebuild. If you have read repair enabled it will still get some traffic. * Do rebuild * Reset severity to 0 Hope that helps. Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 13/05/2014, at 5:18 am, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Hello, I'm not able to replace a dead node using the ordinary procedure (boostrap+join), and would like to rebuild the replacement node from another DC. The problem is that if I start a node with auto_bootstrap=false to perform the rebuild, it automatically starts serving empty reads (CL=LOCAL_ONE). Is there a way to disable reads from a node while performing rebuild from another datacenter? I tried starting the node in write survery mode, but the nodetool rebuild command does not work in this mode. Thanks, -- Paulo Motta Chaordic | Platform www.chaordic.com.br +55 48 3232.3200
Re: Storing log structured data in Cassandra without compactions for performance boost.
Whats your data model look like? I think it would be best to just disable compactions. Why? are you never doing reads? There is also a cost to repairs/bootstrapping when you have a ton of sstables. This might be a premature optimization. If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query. Data model can help or hurt a lot though. If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired. Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones. --- Chris Lohfink On May 6, 2014, at 7:55 PM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Schema disagreement errors
Hey Gaurav, You should consider moving to 2.0.7 which fixes a bunch of these schema disagreement problems. You could also play around with nodetool resetlocalschema on the nodes that are behind, but be careful with that one. I'd go with 2.0.7 first for sure. Thanks, Vince. On Mon, May 12, 2014 at 12:31 PM, Gaurav Sehgal gsehg...@gmail.com wrote: We have recently started seeing a lot of Schema Disagreement errors. We are using Cassandra 2.0.6 with Oracle Java 1.7. I went through the Cassandra FAQ and followed the below steps: - nodetool disablethrift - nodetool disablegossip - nodetool drain - 'kill pid'. As per the documentation; the commit logs should have been flush; but that did not happen in our case. The commit logs were still there. So, I removed them manually to make sure there are no commit logs when cassandra start up( which was fine in our case as this data can always be replayed). I also deleted the schema* directory from the /data/system folder. Though when we started cassandra back up the issue started happening again. Any help would be appreciated Cheers! Gaurav
How to balance this cluster out ?
I have a cluster that looks like this: Datacenter: us-east == Replicas: 2 Address RackStatus State LoadOwns Token 113427455640312821154458202477256070484 *.*.*.1 1b Up Normal 141.88 GB 66.67% 56713727820156410577229101238628035242 *.*.*.2 1a Up Normal 113.2 GB66.67% 210 *.*.*.3 1d Up Normal 102.37 GB 66.67% 113427455640312821154458202477256070484 Obviously, the first node in 1b has 40% more data than the others. If I wanted to rebalance this cluster, how would I go about that ? Would shifting the tokens accomplish what I need and which tokens ? Regards, Oleg
Re: Schema disagreement errors
On Tue, May 13, 2014 at 5:11 PM, Donald Smith donald.sm...@audiencescience.com wrote: I too have noticed that after doing “nodetool flush” (or “nodetool drain”), the commit logs are still there. I think they’re NEW (empty) commit logs, but I may be wrong. Anyone know? Assuming they are being correctly marked clean after drain (which historically has been a nontrivial assumption) they are new, empty commit log segments which have been recycled. =Rob
Couter column family performance problems
Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter After inserting a few million keys... 55 mil, the performance goes down the drain, 2-3 nodes in the cluster are on medium load, and when inserting batches of same lengths writes take longer and longer until the whole cluster becomes loaded and I get a lot of TExceptions... and the cluster becomes unresponsive. Did anyone have the same problem? Feel free to comment and share experiences about counter CF performance.
Datacenter understanding question
If I have configuration of two data center with one node each. Replication factor is also 1. Will these 2 nodes going to be mirrored/replicated?
Re: Avoiding email duplicates when registering users
the real question is - if you want the email to be unique, why use surrogate primary key as UUID. I wonder what UUID gives you at all? If you want to have non email primary key, why not use md5(email) ? On Wed, May 7, 2014 at 2:19 AM, Tyler Hobbs ty...@datastax.com wrote: On Mon, May 5, 2014 at 10:27 AM, Ignacio Martin natx...@gmail.com wrote: When a user registers, the server generates a UUID and performs an INSERT ... IF NOT EXISTS into the email_to_UUID table. Immediately after, perform a SELECT from the same table and see if the read UUID is the same that the one we just generated. If it is, we are allowed to INSERT the data in the user table, knowing that no other will be doing it. INSERT ... IF NOT EXISTS is the correct thing to do here, but you don't need to SELECT afterwards. If the row does exist, the query results will show that the insert was not applied and the existing row will be returned. -- Tyler Hobbs DataStax http://datastax.com/
Really need some advices on large data considerations
Hi, We're going to deploy a large Cassandra cluster in PB level. Our scenario would be: 1. Lots of writes, about 150 writes/second at average, and about 300K size per write. 2. Relatively very small reads 3. Our data will be never updated 4. But we will delete old data periodically to free space for new data We've learned that compaction strategy would be an important point cause we've ran into 'no space' trouble because of the 'sized tiered' compaction strategy. We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderations and is this enough or update-to-date? From our experience changing any settings/schema during a large cluster is on line and has been running for some time is really really a pain. So we're gathering more info and expecting some more practical suggestions before we set up the cassandra cluster. Thanks and any help is of great appreciation
Re: How long are expired values actually returned?
Ah thank you! Am 12.05.2014 16:31, schrieb Peter Reilly: You need to set grace period as well. Peter On Thu, May 8, 2014 at 8:44 AM, Sebastian Schmidt isib...@gmail.com mailto:isib...@gmail.com wrote: Hi, I'm using the TTL feature for my application. In my tests, when using a TTL of 5, the inserted rows are still returned after 7 seconds, and after 70 seconds. Is this normal or am I doing something wrong?. Kind Regards, Sebastian signature.asc Description: OpenPGP digital signature