Re: Questions about cleaning up/purging Hinted Handoffs
Yep, the hinted handoff in 1.0.8 is abysmal at best. What is your replication facter, i have had huge hints pile up, where i had to drop the entire coloumn family and then run a repair. Either that or you can use the JMX HintedHandoffManager and delete hints per endpoint. Also it maybe worthwhile to investigate why you have hints pile up Rahul On Mon, Sep 8, 2014 at 11:09 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Sep 5, 2014 at 3:20 PM, Rahul Neelakantan ra...@rahul.be wrote: The reason I asked about he hints is because I see hints being replayed but the large compacted hints stable still sticks around, perhaps it is a bug with that version . I've seen this behavior with HH in older versions, so probably. =Rob
Re: Questions about cleaning up/purging Hinted Handoffs
I use jmxterm. http://wiki.cyclopsgroup.org/jmxterm/ attach it to your c* process and then use the org.apache.cassandra.db:HintedHandoffManager bean and run deleteHintsforEndpoint ip to drop hints for each ip. On Wed, Sep 10, 2014 at 3:37 AM, Rahul Neelakantan ra...@rahul.be wrote: RF=3, two DCs. (Going to 1.2.x in a few weeks) What's the procedure to drop via JMX? - Rahul 1-678-451-4545 (US) +91 99018-06625 (India) On Sep 9, 2014, at 9:23 AM, Rahul Menon ra...@apigee.com wrote: Yep, the hinted handoff in 1.0.8 is abysmal at best. What is your replication facter, i have had huge hints pile up, where i had to drop the entire coloumn family and then run a repair. Either that or you can use the JMX HintedHandoffManager and delete hints per endpoint. Also it maybe worthwhile to investigate why you have hints pile up Rahul On Mon, Sep 8, 2014 at 11:09 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Sep 5, 2014 at 3:20 PM, Rahul Neelakantan ra...@rahul.be wrote: The reason I asked about he hints is because I see hints being replayed but the large compacted hints stable still sticks around, perhaps it is a bug with that version . I've seen this behavior with HH in older versions, so probably. =Rob
Re: A question about using 'update keyspace with strategyoptions' command
Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: Authentication exception
I could you perhaps check your ntp? On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma jer...@barchart.com wrote: I routinely get this exception from cqlsh on one of my clusters: cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses.') The system_auth keyspace is set to replicate X times given X nodes in each datacenter, and at the time of the exception all nodes are reporting as online and healthy. After a short period (i.e. 30 minutes), it will let me in again. What could be the cause of this?
Re: Dead node seen as UP by replacement node
Since the older node is not available i would ask you to assassinate the old node and then get the node new node to bootstrap. On Thu, Mar 13, 2014 at 10:56 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Yes, exactly. On Thu, Mar 13, 2014 at 1:27 PM, Rahul Menon ra...@apigee.com wrote: And the token value as suggested is tokenvalueoddeadnode-1 ? On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Nope, they have different IPs. I'm using the procedure described here to replace a dead node: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node Dead node token: X (IP: Y) Replacement node token: X-1 (IP: Z) So, as soon as the replacement node (Z) is started, it sees the dead node (Y) as UP, and tries to stream data from it during the join process. About 10 minutes later, the failure detector of Z detects Y as down, but since it was trying to fetch data from him, it fails the join/bootstrap process altogether. -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200 +55 83 9690-1314
Re: Opscenter help?
I have seen the conflicts with sudo error but that was with 3.X rpm on the amazon ami, i was how ever able to install it from the tar ball. As Nick has pointed out, the versions of OS and Opscenter will help in looking at this. Thanks Rahul On Thu, Mar 13, 2014 at 7:56 PM, Nick Bailey n...@datastax.com wrote: I'm happy to help here as well :) Can you give some more information? Specifically: What exact versions of EL5 and EL6 have you tried? What version of OpsCenter are you using? What file/dependency is rpm/yum saying conflicts with sudo? Also, you can find the OpsCenter documentation here http://www.datastax.com/documentation/opscenter/4.1/index.html, although this isn't an issue I've seen before. -Nick On Wed, Mar 12, 2014 at 1:51 PM, Drew from Zhrodague drewzhroda...@zhrodague.net wrote: I am having a hard time installing the Datastax Opscenter agents on EL6 and EL5 hosts. Where is an appropriate place to ask for help? Datastax has move their forums to Stack Exchange, which seems to be a waste of time, as I don't have enough reputation points to properly tag my questions. The agent installation seems to be broken: [] agent rpm conflicts with sudo [] install from opscenter does not work, even if manually installing the rpm (requres --force, conflicts with sudo) [] error message re: log4j #noconf [] Could not find the main class: opsagent.opsagent. Program will exit. [] No other (helpful/more in-depth) documentation exists -- Drew from Zhrodague post-apocalyptic ad-hoc industrialist d...@zhrodague.net
Re: Dead node seen as UP by replacement node
And the token value as suggested is tokenvalueoddeadnode-1 ? On Thu, Mar 13, 2014 at 9:29 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Nope, they have different IPs. I'm using the procedure described here to replace a dead node: http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node Dead node token: X (IP: Y) Replacement node token: X-1 (IP: Z) So, as soon as the replacement node (Z) is started, it sees the dead node (Y) as UP, and tries to stream data from it during the join process. About 10 minutes later, the failure detector of Z detects Y as down, but since it was trying to fetch data from him, it fails the join/bootstrap process altogether.
Re: Possibly losing data with corrupted SSTables
Looks like the sstables are corrupt. I dont believe there is a method to recover those sstables. I would delete them and run a repair to ensure data consistency. Rahul On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Hi, Rahul. I've run nodetool upgradesstable only in the problematic CF. It throwed the following exception: Error occurred while upgrading the sstables for keyspace Sessions java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038 893416 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191) ... ... Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038893416 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038893416 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:123) ... 20 more Regards, Francisco On Jan 29, 2014, at 3:38 PM, Rahul Menon ra...@apigee.com wrote: Francisco, the sstables with *-ib-* is something that was from a previous version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-* convention. You could try running a nodetool sstableupgrade which should ideally upgrade the sstables with the *-ib-* to *-ic-*. Rahul On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException in some nodes and CFs. After looking at some problematic CFs, we observed that some of them have root permissions, instead of cassandra permissions. Also, their names are different from the 'good' ones as we can see below: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13
Re: Possibly losing data with corrupted SSTables
Yes should delete all files related to cfname-ib-num-extension.db Run a repair after deletion On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Ok. I'll try this idea with one sstable. But, should I delete all the files associated with it? I mean, there is a difference in the number of files between the BAD sstable and a GOOD one, as I've already shown: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db GOOD - -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt Should I delete those 3 files? Should I run nodetool refresh after the operation? Best regards, Francisco. On Jan 30, 2014, at 2:02 PM, Rahul Menon ra...@apigee.com wrote: Looks like the sstables are corrupt. I dont believe there is a method to recover those sstables. I would delete them and run a repair to ensure data consistency. Rahul On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Hi, Rahul. I've run nodetool upgradesstable only in the problematic CF. It throwed the following exception: Error occurred while upgrading the sstables for keyspace Sessions java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038 893416 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191) ... ... Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 3622081913630118729 starting at 32906 would be larger than file /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length 1038893416 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.io.IOException: dataSize of 3622081913630118729 starting at 32906
Re: Possibly losing data with corrupted SSTables
Francisco, the sstables with *-ib-* is something that was from a previous version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-* convention. You could try running a nodetool sstableupgrade which should ideally upgrade the sstables with the *-ib-* to *-ic-*. Rahul On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a CorruptSSTableException in some nodes and CFs. After looking at some problematic CFs, we observed that some of them have root permissions, instead of cassandra permissions. Also, their names are different from the 'good' ones as we can see below: BAD -- -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 Sessions-Users-ib-2516-Data.db -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 Sessions-Users-ib-2516-Index.db -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 Sessions-Users-ib-2516-Summary.db GOOD - -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 Sessions-Users-ic-2933-CompressionInfo.db -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 Sessions-Users-ic-2933-Data.db -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 Sessions-Users-ic-2933-Filter.db -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 Sessions-Users-ic-2933-Index.db -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 Sessions-Users-ic-2933-Statistics.db -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 Sessions-Users-ic-2933-Summary.db -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 Sessions-Users-ic-2933-TOC.txt We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in this problematic CF, but it has been running for at least two weeks (it is not frozen) and keeps logging many WARNs while working with the above mentioned SSTable: WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java (line 57) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 3618452438597849419 at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515) at org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Impossible row size 3618452438597849419 ... 10 more 1) I do not think that deleting all data of one node and running 'nodetool rebuild' will work, since we observed that this problem occurs in all nodes. So we may not be able to restore all the data. What can be done in this case? 2) Why the permissions of some sstables are 'root'? Is this problem caused by our manual migration of data? (see long story below) How we ran into this? The long story is that we've tried to move our cluster with sstableloader, but it was unable to load all the data correctly. Our solution was to put ALL cluster data into EACH new node and run 'nodetool refresh'. I performed this task for each node and each column family sequentially. Sometimes I had to rename some sstables, because they came from different nodes with the same name. I don't remember if I ran 'nodetool repair' or even 'nodetool cleanup' in each node. Apparently, the process was successful, and (almost) all the data was moved. Unfortunately, after 3 months since we moved, I am unable to perform read operations in some keys of some CFs. I think that some of these keys belong to the above mentioned sstables. Any insights are welcome. Best regards, Francisco Sobral
Re: Latest Stable version of cassandra in production
You should refer to this https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ Thanks Rahul On Thu, Jan 9, 2014 at 8:06 AM, Sanjeeth Kumar sanje...@exotel.in wrote: Hi all, What is the latest stable version of cassandra you have in production ? We are migrating a large chunk of our mysql database to cassandra. I see a lot of discussions regarding 1.* versions, but I have not seen / could not find discussions regarding using 2.* versions in production. Any suggestions for the version based on your experience? - Sanjeeth
Re: Crash with TombstoneOverwhelmingException
Sanjeeth, Looks like the error is being populated from the hintedhandoff, what is the size of your hints cf? Thanks Rahul On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar sanje...@exotel.in wrote: Hi all, One of my cassandra nodes crashes with the following exception periodically - ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java (line 200) Scanned over 10 tombstones; query aborted (see tombstone_fail_thr eshold) ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java (line 187) Exception in thread Thread[HintedHandoff:33,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Why does this happen? Does this relate to any incorrect config value? The Cassandra Version I'm running is ReleaseVersion: 2.0.3 - Sanjeeth
Re: Bulkoutputformat
Here you go http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html On Fri, Dec 13, 2013 at 7:19 AM, varun allampalli vshoori.off...@gmail.comwrote: Hi Aaron, It seems like you answered the question here. https://groups.google.com/forum/#!topic/nosql-databases/vjZA5vdycWA Can you give me the link to the blog which you mentioned http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ Thanks in advance Varun On Thu, Dec 12, 2013 at 3:36 PM, varun allampalli vshoori.off...@gmail.com wrote: Thanks Aaron, I was able to generate sstables and load using sstableloader. But after loading the tables when I do a select query I get this, the table has only one record. Is there anything I am missing or any logs I can look at. Request did not complete within rpc_timeout. On Wed, Dec 11, 2013 at 7:58 PM, Aaron Morton aa...@thelastpickle.comwrote: If you don’t need to use Hadoop then try the SSTableSimpleWriter and sstableloader , this post is a little old but still relevant http://www.datastax.com/dev/blog/bulk-loading Otherwise AFAIK BulkOutputFormat is what you want from hadoop http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 12/12/2013, at 11:27 am, varun allampalli vshoori.off...@gmail.com wrote: Hi All, I want to bulk insert data into cassandra. I was wondering of using BulkOutputformat in hadoop. Is it the best way or using driver and doing batch insert is the better way. Are there any disandvantages of using bulkoutputformat. Thanks for helping Varun
Re: Write performance with 1.2.12
Quote from http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 *Murmur3Partitioner is NOT compatible with RandomPartitioner, so if you’re upgrading and using the new cassandra.yaml file, be sure to change the partitioner back to RandomPartitioner* On Thu, Dec 12, 2013 at 10:57 PM, srmore comom...@gmail.com wrote: On Thu, Dec 12, 2013 at 11:15 AM, J. Ryan Earl o...@jryanearl.us wrote: Why did you switch to RandomPartitioner away from Murmur3Partitioner? Have you tried with Murmur3? 1. # partitioner: org.apache.cassandra.dht.Murmur3Partitioner 2. partitioner: org.apache.cassandra.dht.RandomPartitioner Since I am comparing between the two versions I am keeping all the settings same. I see Murmur3Partitioner has some performance improvement but then switching back to RandomPartitioner should not cause performance to tank, right ? or am I missing something ? Also, is there an easier way to update the data from RandomPartitioner to Murmur3 ? (upgradesstable ?) On Fri, Dec 6, 2013 at 10:36 AM, srmore comom...@gmail.com wrote: On Fri, Dec 6, 2013 at 9:59 AM, Vicky Kak vicky@gmail.com wrote: You have passed the JVM configurations and not the cassandra configurations which is in cassandra.yaml. Apologies, was tuning JVM and that's what was in my mind. Here are the cassandra settings http://pastebin.com/uN42GgYT The spikes are not that significant in our case and we are running the cluster with 1.7 gb heap. Are these spikes causing any issue at your end? There are no big spikes, the overall performance seems to be about 40% low. On Fri, Dec 6, 2013 at 9:10 PM, srmore comom...@gmail.com wrote: On Fri, Dec 6, 2013 at 9:32 AM, Vicky Kak vicky@gmail.com wrote: Hard to say much without knowing about the cassandra configurations. The cassandra configuration is -Xms8G -Xmx8G -Xmn800m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=4 -XX:MaxTenuringThreshold=2 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly Yes compactions/GC's could skipe the CPU, I had similar behavior with my setup. Were you able to get around it ? -VK On Fri, Dec 6, 2013 at 7:40 PM, srmore comom...@gmail.com wrote: We have a 3 node cluster running cassandra 1.2.12, they are pretty big machines 64G ram with 16 cores, cassandra heap is 8G. The interesting observation is that, when I send traffic to one node its performance is 2x more than when I send traffic to all the nodes. We ran 1.0.11 on the same box and we observed a slight dip but not half as seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing CL to ONE make a slight improvement but not much. The read_Repair_chance is 0.1. We see some compactions running. following is my iostat -x output, sda is the ssd (for commit log) and sdb is the spinner. avg-cpu: %user %nice %system %iowait %steal %idle 66.460.008.950.010.00 24.58 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.0027.60 0.00 4.40 0.00 256.00 58.18 0.012.55 1.32 0.58 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 0.0027.60 0.00 4.40 0.00 256.00 58.18 0.012.55 1.32 0.58 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.60 0.00 4.80 8.00 0.005.33 2.67 0.16 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-3 0.00 0.00 0.00 24.80 0.00 198.40 8.00 0.249.80 0.13 0.32 dm-4 0.00 0.00 0.00 6.60 0.0052.80 8.00 0.011.36 0.55 0.36 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-6 0.00 0.00 0.00 24.80 0.00 198.40 8.00 0.29 11.60 0.13 0.32 I can see I am cpu bound here but couldn't figure out exactly what is causing it, is this caused by GC or Compaction ? I am thinking it is compaction, I see a lot of context switches and interrupts in my vmstat output. I don't see GC activity in the logs but see some compaction activity. Has anyone seen this ? or know what can be done to free up the CPU. Thanks, Sandeep
Re: nodetool repair keeping an empty cluster busy
Sven So basically when you run a repair you are essentially telling your cluster to run a validation compaction, which generates a merkle tree on all the nodes. These trees are used to identify the inconsistencies. So there is quite a bit of streaming which you see as your network traffic. Rahul On Wed, Dec 11, 2013 at 11:02 AM, Sven Stark sven.st...@m-square.com.auwrote: Corollary: what is getting shipped over the wire? The ganglia screenshot shows the network traffic on all the three hosts on which I ran the nodetool repair. [image: Inline image 1] remember UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 Much appreciated. Sven On Wed, Dec 11, 2013 at 3:56 PM, Sven Stark sven.st...@m-square.com.auwrote: Howdy! Not a matter of life or death, just curious. I've just stood up a three node cluster (v1.2.8) on three c3.2xlarge boxes in AWS. Silly me forgot the correct replication factor for one of the needed keyspaces. So I changed it via cli and ran a nodetool repair. Well .. there is no data at all in the keyspace yet, only the definition and nodetool repair ran about 20minutes using 2 of the 8 CPU fully. Any hints what nodetool repair is doing on an empty cluster that makes the host spin so hard? Cheers, Sven == Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 22.7%us, 1.0%sy, 2.9%ni, 73.0%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 15339196k total, 7474360k used, 7864836k free, 251904k buffers Swap:0k total,0k used,0k free, 798324k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 10840 cassandr 20 0 8354m 4.1g 19m S 218 28.0 35:25.73 jsvc 16675 kafka 20 0 3987m 192m 12m S2 1.3 0:47.89 java 20328 root 20 0 5613m 569m 16m S2 3.8 1:35.13 jsvc 5969 exhibito 20 0 6423m 116m 12m S1 0.8 0:25.87 java 14436 tomcat7 20 0 3701m 167m 11m S1 1.1 0:25.80 java 6278 exhibito 20 0 6487m 119m 9984 S0 0.8 0:22.63 java 17713 storm 20 0 6033m 159m 11m S0 1.1 0:10.99 java 18769 storm 20 0 5773m 156m 11m S0 1.0 0:10.71 java root@xxx-01:~# nodetool -h `hostname` status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.1.2.11 107.47 KB 256 32.9% 1f800723-10e4-4dcd-841f-73709a81d432 rack1 UN 10.1.2.10 127.67 KB 256 32.4% bd6b2059-e9dc-4b01-95ab-d7c4fc0ec639 rack1 UN 10.1.2.12 107.62 KB 256 34.7% 5258f178-b20e-408f-a7bf-b6da2903e026 rack1 root@xxx-01:~# nodetool -h `hostname` compactionstats pending tasks: 1 compaction typekeyspace column family completed total unit progress Active compaction remaining time :n/a root@xxx-01:~# nodetool -h `hostname` netstats Mode: NORMAL Not sending any streams. Not receiving any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 57155 Responses n/a 0 14573 image.png
Re: cassandra backup
You should look at this - https://github.com/amorton/cassback i dont believe its setup to use 1.2.10 and above but i believe is just small tweeks to get it running. Thanks Rahul On Fri, Dec 6, 2013 at 7:09 PM, Michael Theroux mthero...@yahoo.com wrote: Hi Marcelo, Cassandra provides and eventually consistent model for backups. You can do staggered backups of data, with the idea that if you restore a node, and then do a repair, your data will be once again consistent. Cassandra will not automatically copy the data to other nodes (other than via hinted handoff). You should manually run repair after restoring a node. You should take snapshots when doing a backup, as it keeps the data you are backing up relevant to a single point in time, otherwise compaction could add/delete files one you mid-backup, or worse, I imagine attempt to access a SSTable mid-write. Snapshots work by using links, and don't take additional storage to perform. In our process we create the snapshot, perform the backup, and then clear the snapshot. One thing to keep in mind in your S3 cost analysis is that, even though storage is cheap, reads/writes to S3 are not (especially writes). If you are using LeveledCompaction, or otherwise have a ton of SSTables, some people have encountered increased costs moving the data to S3. Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync data too. Thus far this has worked very well for us. -Mike On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello everyone, I am trying to create backups of my data on AWS. My goal is to store the backups on S3 or glacier, as it's cheap to store this kind of data. So, if I have a cluster with N nodes, I would like to copy data from all N nodes to S3 and be able to restore later. I know Priam does that (we were using it), but I am using the latest cassandra version and we plan to use DSE some time, I am not sure Priam fits this case. I took a look at the docs: http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html And I am trying to understand if it's really needed to take a snapshot to create my backup. Suppose I do a flush and copy the sstables from each node, 1 by one, to s3. Not all at the same time, but one by one. When I try to restore my backup, data from node 1 will be older than data from node 2. Will this cause problems? AFAIK, if I am using a replication factor of 2, for instance, and Cassandra sees data from node X only, it will automatically copy it to other nodes, right? Is there any chance of cassandra nodes become corrupt somehow if I do my backups this way? Best regards, Marcelo Valle.
Re: How to monitor the progress of a HintedHandoff task?
Tom, you should look at phi_convict_threshold and try and increase the value if you have too much chatter on your network. Also, rebuilding the entire node because of a OOM does not make sense, could you please post the C* version that you are using the head size you have configured? Thanks Rahul On Tue, Dec 3, 2013 at 7:41 PM, Tom van den Berge t...@drillster.com wrote: Rahul, This problem occurs every now and then, and currently everything is ok, so there are no hints. But whenever it happens, the hints are quickly piling up. This results in heap problems on the node (Heap is 0.813462 full... appears many times). This in turn results in the flushing of the 'hints' column family, to relieve memory pressure. According to the log message, the size varies between 50 and 60MB). But since the HintedHandoffManager is reading from the hints CF, it will probably pull it back into a memtable again -- that's at least my understanding of how it works. So I guess that flushing the hints CF while the HintedHandoffManager is working on it only makes things worse, and it could be the reason that the process never ends. What I typically see when this happens is that the hints keep piling up, and eventually the node comes to a grinding halt (OOM). Then I have to rebuild the node entirely (only removing the hints doesn't work). The reason for hints to start accumulating in the first place might be a spike in CF writes that must be replicated to a node in another data center. The available bandwidth to that data center might not be able to handle the data quickly enough, resulting in stored hints. The HintedHandoff task that is started is targeting that remote node. Thanks, Tom On Tue, Dec 3, 2013 at 2:22 PM, Rahul Menon ra...@apigee.com wrote: Tom, Do you know why these hints are piling up? What is the size of the hints cf? Thanks Rahul On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.comwrote: Hi Rahul, Thanks for your reply. I have never seen message like Timed out replaying hints to..., which is a good thing then, I suppose ;) Normally, I do see the Finished hinted handoff... log message. However, every now and then this message is not logged, not even after several hours. This is the problem I'm trying to solve. The log messages you describe are quite course-grained; they only tell you that a task has started or finished, but not how this task is progressing. And that's exactly what I would like to know if I see that a task has started, but has not finished after a reasonable amount of time. So I guess the only way to see learn the progress is to look inside the 'hints' column family then.I'll give that a try. Thanks, Tom On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote: Tom, You should check the size of the hints column family to determine how much are present. The hints are a super column family and its keys are destination tokens. You could look at it if you would like. Hints send and timedouts are logged, you should be seeing something like Timed out replaying hints to {}; aborting ({} delivered OR Finished hinted handoff of {} rows to endpoint {} Thanks Rahul On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote: Hi, Is there a way to monitor the progress of a hinted handoff task? I found the following two mbeans providing some info: org.apache.cassandra.internal:type=HintedHandoff, which tells me that there is 1 active task, and org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(), which quite often gives a timeout when executed. Ideally, I would like to see how many hints have been sent (e.g. over the last minute or so), and how many hints are still to be sent (although I assume that's what countPendingHints normally does?) I'm experiencing hinted handoff tasks that are started, but never finish, so I would like to know what the task is doing. My log shows this: INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66 (nothing more for [HintedHandoff:1]) The node is up and running, the network connection is ok, no gossip messages appear in the logs. Any idea is welcome. (Casandra 1.2.3) -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com
Re: How to monitor the progress of a HintedHandoff task?
Tom, You should check the size of the hints column family to determine how much are present. The hints are a super column family and its keys are destination tokens. You could look at it if you would like. Hints send and timedouts are logged, you should be seeing something like Timed out replaying hints to {}; aborting ({} delivered OR Finished hinted handoff of {} rows to endpoint {} Thanks Rahul On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.com wrote: Hi, Is there a way to monitor the progress of a hinted handoff task? I found the following two mbeans providing some info: org.apache.cassandra.internal:type=HintedHandoff, which tells me that there is 1 active task, and org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(), which quite often gives a timeout when executed. Ideally, I would like to see how many hints have been sent (e.g. over the last minute or so), and how many hints are still to be sent (although I assume that's what countPendingHints normally does?) I'm experiencing hinted handoff tasks that are started, but never finish, so I would like to know what the task is doing. My log shows this: INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66 (nothing more for [HintedHandoff:1]) The node is up and running, the network connection is ok, no gossip messages appear in the logs. Any idea is welcome. (Casandra 1.2.3) -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com
Re: How to monitor the progress of a HintedHandoff task?
Tom, Do you know why these hints are piling up? What is the size of the hints cf? Thanks Rahul On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.com wrote: Hi Rahul, Thanks for your reply. I have never seen message like Timed out replaying hints to..., which is a good thing then, I suppose ;) Normally, I do see the Finished hinted handoff... log message. However, every now and then this message is not logged, not even after several hours. This is the problem I'm trying to solve. The log messages you describe are quite course-grained; they only tell you that a task has started or finished, but not how this task is progressing. And that's exactly what I would like to know if I see that a task has started, but has not finished after a reasonable amount of time. So I guess the only way to see learn the progress is to look inside the 'hints' column family then.I'll give that a try. Thanks, Tom On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote: Tom, You should check the size of the hints column family to determine how much are present. The hints are a super column family and its keys are destination tokens. You could look at it if you would like. Hints send and timedouts are logged, you should be seeing something like Timed out replaying hints to {}; aborting ({} delivered OR Finished hinted handoff of {} rows to endpoint {} Thanks Rahul On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote: Hi, Is there a way to monitor the progress of a hinted handoff task? I found the following two mbeans providing some info: org.apache.cassandra.internal:type=HintedHandoff, which tells me that there is 1 active task, and org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(), which quite often gives a timeout when executed. Ideally, I would like to see how many hints have been sent (e.g. over the last minute or so), and how many hints are still to be sent (although I assume that's what countPendingHints normally does?) I'm experiencing hinted handoff tasks that are started, but never finish, so I would like to know what the task is doing. My log shows this: INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66 (nothing more for [HintedHandoff:1]) The node is up and running, the network connection is ok, no gossip messages appear in the logs. Any idea is welcome. (Casandra 1.2.3) -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com
Re: What is listEndpointsPendingHints?
Tom, Here is the definition List all the endpoints that this node has hints for, and count the number of hints for each such endpoint. Returns:map of endpoint - hint count I would suggest looking at at the gossipinfo to validate if there are any nodes which have that token value. If there is ( there should be since its storing hints ) you should assassinate the node and you should be on your way. Thanks Rahul On Tue, Nov 26, 2013 at 6:09 PM, Tom van den Berge t...@drillster.comwrote: When I run the operation listEndpointsPendingHints on the mbean org.apache.cassandra.db:type=HintedHandoffManager, I'm getting ( 126879603237190600081737151857243914981 ) It suggests that there are pending hints, but the org.apache.cassandra.internal:type=HintedHandoff mbean provides these figures: TotalBlockedTasks = 0; CurrentlyBlockedTasks = 0; CoreThreads = 2; MaximumThreads = 2; ActiveCount = 0; PendingTasks = 0; CompletedTasks = 0; I'm wondering what it means that it returns a value, and what this value is? It looks like a token, but it's not one of the tokens of my nodes. The reason I'm looking into this is that my cluster suffering every now and then from never ending (dead) hinted handoff tasks, resulting in a flooding of hints on the node. Thanks, Tom
Re: 1.1.11: system keyspace is filling up
Oleg, The system keyspace is not replicated it is local to the node. You should check your logs to see if there are Timeouts from streaming hints, i believe the default value to stream hints it 10 seconds. When i ran into this problem i truncated hints to clear out the space and then ran a repair so ensure that all the data was consistant across all nodes, even if there was a failure. -rm On Tue, Nov 5, 2013 at 6:29 PM, Oleg Dulin oleg.du...@gmail.com wrote: What happens if they are not being successfully delivered ? Will they eventually TTL-out ? Also, do I need to truncate hints on every node or is it replicated ? Oleg On 2013-11-04 21:34:55 +, Robert Coli said: On Mon, Nov 4, 2013 at 11:34 AM, Oleg Dulin oleg.du...@gmail.com wrote: I have a dual DC setup, 4 nodes, RF=4 in each. The one that is used as primary has its system keyspace fill up with 200 gigs of data, majority of which is hints. Why does this happen ? How can I clean it up ? If you have this many hints, you probably have flapping / frequent network partition, or very overloaded nodes. If you compare the number of hints to the number of dropped messages, that would be informative. If you're hinting because you're dropping, increase capacity. If you're hinting because of partition, figure out why there's so much partition. WRT cleaning up hints, they will automatically be cleaned up eventually, as long as they are successfully being delivered. If you need to manually clean them up you can truncate system.hints keyspace. =Rob -- Regards, Oleg Dulin http://www.olegdulin.com