Re: Getting error Too many in flight hints
thanks a lot for the explanation. if I understand it correctly it basically back pressure from C*, it's telling me that it's overloaded and that I need to back off. I better start a few more nodes, I guess. T# On Thu, May 30, 2013 at 10:47 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg t...@iconara.net wrote: I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), and my application is talking to it over the binary protocol (I'm using JRuby and the cql-rb driver). I get this error quite frequently: Too many in flight hints: 2411 (the exact number varies) Has anyone any idea of what's causing it? I'm pushing the cluster quite hard with writes (but no reads at all). The code that produces this message (below) sets the bound based on the number of available processors. It is a bound of number of in progress hints. An in progress hint (for some reason redundantly referred to as in flight) is a hint which has been submitted to the executor which will ultimately write it to local disk. If you get OverloadedException, this means that you were trying to write hints to this executor so fast that you risked OOM, so Cassandra refused to submit your hint to the hint executor and therefore (partially) failed your write. private static volatile int maxHintsInProgress = 1024 * FBUtilities.getAvailableProcessors(); [... snip ...] for (InetAddress destination : targets) { // avoid OOMing due to excess hints. we need to do this check even for live nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. if (totalHintsInProgress.get() maxHintsInProgress (hintsInProgress.get(destination).get() 0 shouldHint(destination))) { throw new OverloadedException(Too many in flight hints: + totalHintsInProgress.get()); } If Cassandra didn't return this exception, it might OOM while enqueueing your hints to be stored. Giving up on trying to enqueue a hint for the failed write is chosen instead. The solution is to reduce your write rate, ideally by enough that you don't even queue hints in the first place. =Rob
Re: Cassandra performance decreases drastically with increase in data size.
I believe you should roll out more nodes as a temporary fix to your problem, 400GB on all nodes means (as correctly mentioned in other mails of this thread) you are spending more time on GC. Check out the second comment in this link by Aaron Morton, he says the more than 300GB can be problematic, though this post is about older version of cassandra but I believe concept still stands true: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html Thanks On May 29, 2013, at 9:32 PM, srmore comom...@gmail.com wrote: Hello, I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing after some time. Some hunting landed me to this page http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks about the large data sets and explains that it might be because I am going through multiple layers of OS cache, but does not tell me how to tune it. So, my question is, are there any optimizations that I can do to handle these large datatasets ? and why does my performance go back to normal when I restart Cassandra ? Thanks !
Re: Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?
This is incorrect. IMO that page is misleading. replicate on write should normally always be turned on, or the change will only be recorded on one node. Replicate on write is asynchronous with respect to the request and doesn't affect consistency level at all. On Wed, May 29, 2013 at 7:32 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: To answer my own question, directly from the docs: http://www.datastax.com/docs/1.0/configuration/storage_configuration#replicate-on-write. It appears the answer to this is: Yes, CL.QUORUM isn't necessary for reads. Essentially, replicate_on_write sets the CL to ALL regardless of what you actually set it to (and for good reason). On Wed, May 29, 2013 at 9:47 AM, Andrew Bialecki andrew.biale...@gmail.com wrote: Quick question about counter columns. In looking at the replicate_on_write setting, assuming you go with the default of true, my understanding is it writes the increment to all replicas on any increment. If that's the case, doesn't that mean there's no point in using CL.QUORUM for reads because all replicas have the same values? Similarly, what effect does the read_repair_chance have on counter columns since they should need to read repair on write. In anticipation a possible answer, that both CL.QUORUM for reads and read_repair_chance only end up mattering for counter deletions, it's safe to only use CL.ONE and disable the read repair if we're never deleting counters. (And, of course, if we did start deleting counters, we'd need to revert those client and column family changes.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?
I agree, the page is clearly misleading in its formulation. However, for the sake of being precise, I'll note that it is not untrue strictly speaking. If replicate_on_write is true (the default that you should probably not change unless you consider yourself an expert in the Cassandra counters implementation), the a write will be written to all replica, and that does not depend of the consistency level of the operation. *But*, please note that this is also true for *every* other write in Cassandra. I.e. for non-counters writes, we *always* replicate the write to every replica regardless of the consistency level. The only thing the CL change is how many acks from said replica we wait for before returning a success to the client. And it works the exact same way for counters with replicate_on_write. Or put another way, by default, counters works exactly as normal writes as far CL is concerned. So no, replicate_on_write does *not* set the CL to ALL regardless of what you set. However, if you set replicate_on_write to false, we will only write the counter to 1 replica. Which means that the only CL that you will be able to use for writes is ONE (we don't allow ANY for counters). -- Sylvain On Fri, May 31, 2013 at 9:20 AM, Peter Schuller peter.schul...@infidyne.com wrote: This is incorrect. IMO that page is misleading. replicate on write should normally always be turned on, or the change will only be recorded on one node. Replicate on write is asynchronous with respect to the request and doesn't affect consistency level at all. On Wed, May 29, 2013 at 7:32 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: To answer my own question, directly from the docs: http://www.datastax.com/docs/1.0/configuration/storage_configuration#replicate-on-write . It appears the answer to this is: Yes, CL.QUORUM isn't necessary for reads. Essentially, replicate_on_write sets the CL to ALL regardless of what you actually set it to (and for good reason). On Wed, May 29, 2013 at 9:47 AM, Andrew Bialecki andrew.biale...@gmail.com wrote: Quick question about counter columns. In looking at the replicate_on_write setting, assuming you go with the default of true, my understanding is it writes the increment to all replicas on any increment. If that's the case, doesn't that mean there's no point in using CL.QUORUM for reads because all replicas have the same values? Similarly, what effect does the read_repair_chance have on counter columns since they should need to read repair on write. In anticipation a possible answer, that both CL.QUORUM for reads and read_repair_chance only end up mattering for counter deletions, it's safe to only use CL.ONE and disable the read repair if we're never deleting counters. (And, of course, if we did start deleting counters, we'd need to revert those client and column family changes.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Cassandra 1.1.11 does not always show filename of corrupted files
Ok, looking at the code I can see that 1.2 fixes the issue: try { validBufferBytes = metadata.compressor().uncompress(compressed.array(), 0, chunk.length, buffer, 0); } catch (IOException e) { throw new CorruptBlockException(getPath(), chunk); } So thats nice :-) But does nobody else find the old behaviour annoying? Nobody ever wanted to identfy the broken files? cheers, Christian On Thu, May 30, 2013 at 7:11 PM, horschi hors...@gmail.com wrote: Hi, we had some hard-disk issues this week, which caused some datafiles to get corrupt, which was reported by the compaction. My approach to fix this was to delete the corrupted files and run repair. That sounded easy at first, but unfortunetaly C* 1.1.11 sometimes does not show which datafile is causing the exception. How do you handle such cases? Do you delete the entire CF or do you look up the compaction-started message and delete the files being involved? In my opinion the Stacktrace should always show the filename of the file which could not be read. Does anybody know if there were already changes to the logging since 1.1.11? CASSANDRA-2261https://issues.apache.org/jira/browse/CASSANDRA-2261does not seem to have fixed the Exceptionhandling part. Were there perhaps changes in 1.2 with the new disk-failure handling? cheers, Christian PS: Here are some examples I found in my logs: *Bad behaviour:* ERROR [ValidationExecutor:1] 2013-05-29 13:26:09,121 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[ValidationExecutor:1,1,main] java.io.IOError: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:726) at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:69) at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:457) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at
RE: java.lang.AssertionError on starting the node
What was the node doing right before the ERROR? Can you post some more log? Thanks,SC Date: Fri, 31 May 2013 10:57:38 +0530 From: himanshu.jo...@orkash.com To: user@cassandra.apache.org Subject: java.lang.AssertionError on starting the node Hi, I have created a 2 node test cluster in Cassandra version 1.2.3 with Simple Strategy and Replication Factor 2. The Java version is 1.6.0_27 The seed node is working fine but when I am starting the second node it is showing the following error: ERROR 10:16:55,603 Exception in thread Thread[FlushWriter:2,5,main] java.lang.AssertionError: 105565 at org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:342) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:481) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:440) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) This node was working fine earlier and is having the data also. Any help would be appreciated. -- Thanks Regards, Himanshu Joshi
Re: Bulk loading into CQL3 Composite Columns
Hi Keith... Thanks for all your help so far. I've done some additional testing and I can see no difference between having all the columns as part of the primary key or having only a subset. Granted, in my contrived example there is no benefit to having all the columns in the primary key, but I believe in my real use-case it makes sense... (If you imagine val1 being a category of data and val2 being an amount, then I can filter on a value for val1 and get sorted results for val2... I could accomplish the same thing by adding val1 to the rowkey, but I wanted to ensure my rows are of appropriate width). I also tried using the Astyanax library with the Composite handling you suggested and I see exactly the same results as when I use the CompositeType Builder. If my composite type has two integers, representing my val1 and val2 and I add two values to my builder (or to the Astyanax Composite() class), the sstableloader imports the data, but I get an ArrayIndexOutOfBoundException when selecting from the table and cqlsh actually appears to loose the connection to the DB... I have to restart cqlsh before I can do anything further. The stack trace for the exception Cassandra throws is: ERROR 09:33:01,130 Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.cassandra.cql3.statements.ColumnGroupMap.add(ColumnGroupMap.java:43) at org.apache.cassandra.cql3.statements.ColumnGroupMap.access$200(ColumnGroupMap.java:31) at org.apache.cassandra.cql3.statements.ColumnGroupMap$Builder.add(ColumnGroupMap.java:128) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:730) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:134) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:128) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:56) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143) at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1707) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) However, I have found a way that I can trick it into working... Or so it seems, although it strikes me as hacky. If I define my column comparator for the SSTableSimpleUnsortedWriter as: final ListAbstractType? compositeTypes = new ArrayList(); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); which adds an extra IntegerType, as I am actually only trying to insert 2 integer values, and I build my composite for the row as such: final Composite columnComposite = new Composite(); columnComposite.setComponent(0, 5, IntegerSerializer.get()); columnComposite.setComponent(1, 10, IntegerSerializer.get()); columnComposite.setComponent(2, 20, IntegerSerializer.get()); // Dummy value, I actually don't want a value with index 2 inserted The data imports correctly, the value 5 gets stored as val1, 10 gets stored as val2, and 20 appears to be thrown away. Am I just doing something wonky here, or am I running up against a bug somewhere? The full working source is: package com.exinda.bigdata.cassandra; import static org.apache.cassandra.utils.ByteBufferUtil.bytes; import java.io.File; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.List; import org.apache.cassandra.db.marshal.AbstractType; import org.apache.cassandra.db.marshal.CompositeType; import org.apache.cassandra.db.marshal.CompositeType.Builder; import org.apache.cassandra.db.marshal.IntegerType; import org.apache.cassandra.dht.Murmur3Partitioner; import org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter; //Assumes a keyspace called 'bigdata' and a table called 'test' with the following definition: // CREATE TABLE test (key TEXT, val1 INT, val2 INT, PRIMARY KEY (key, val1, val2)); public class CassandraLoader { public static void main(String[] args) throws Exception { final ListAbstractType? compositeTypes = new ArrayList();
Re: Bulk loading into CQL3 Composite Columns
Another option is not having it part of the primary key and using PlayOrm to query but to succeed and scale, you would need to also use PlayOrm partitions and then you can query in the partition and sort stuff. Dean From: Daniel Morton dan...@djmorton.commailto:dan...@djmorton.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, May 31, 2013 9:01 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Bulk loading into CQL3 Composite Columns Hi Keith... Thanks for all your help so far. I've done some additional testing and I can see no difference between having all the columns as part of the primary key or having only a subset. Granted, in my contrived example there is no benefit to having all the columns in the primary key, but I believe in my real use-case it makes sense... (If you imagine val1 being a category of data and val2 being an amount, then I can filter on a value for val1 and get sorted results for val2... I could accomplish the same thing by adding val1 to the rowkey, but I wanted to ensure my rows are of appropriate width). I also tried using the Astyanax library with the Composite handling you suggested and I see exactly the same results as when I use the CompositeType Builder. If my composite type has two integers, representing my val1 and val2 and I add two values to my builder (or to the Astyanax Composite() class), the sstableloader imports the data, but I get an ArrayIndexOutOfBoundException when selecting from the table and cqlsh actually appears to loose the connection to the DB... I have to restart cqlsh before I can do anything further. The stack trace for the exception Cassandra throws is: ERROR 09:33:01,130 Error occurred during processing of message. java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.cassandra.cql3.statements.ColumnGroupMap.add(ColumnGroupMap.java:43) at org.apache.cassandra.cql3.statements.ColumnGroupMap.access$200(ColumnGroupMap.java:31) at org.apache.cassandra.cql3.statements.ColumnGroupMap$Builder.add(ColumnGroupMap.java:128) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:730) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:134) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:128) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:56) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143) at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1707) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) However, I have found a way that I can trick it into working... Or so it seems, although it strikes me as hacky. If I define my column comparator for the SSTableSimpleUnsortedWriter as: final ListAbstractType? compositeTypes = new ArrayList(); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); compositeTypes.add(IntegerType.instance); which adds an extra IntegerType, as I am actually only trying to insert 2 integer values, and I build my composite for the row as such: final Composite columnComposite = new Composite(); columnComposite.setComponent(0, 5, IntegerSerializer.get()); columnComposite.setComponent(1, 10, IntegerSerializer.get()); columnComposite.setComponent(2, 20, IntegerSerializer.get()); // Dummy value, I actually don't want a value with index 2 inserted The data imports correctly, the value 5 gets stored as val1, 10 gets stored as val2, and 20 appears to be thrown away. Am I just doing something wonky here, or am I running up against a bug somewhere? The full working source is: package com.exinda.bigdata.cassandra; import static org.apache.cassandra.utils.ByteBufferUtil.bytes; import java.io.File; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.List; import org.apache.cassandra.db.marshal.AbstractType; import
Re: Cassandra 1.1.11 does not always show filename of corrupted files
On Fri, May 31, 2013 at 7:44 AM, horschi hors...@gmail.com wrote: But does nobody else find the old behaviour annoying? Nobody ever wanted to identfy the broken files? I found it annoying, as did (presumably) whomever patched it for 1.2. :D =Rob