CorruptSSTableException in system_auth keyspace
Hello, we are trying to add authentication to our Cassandra cluster. We add our authenticated users during puppet deployment using the default user, which is then disabled. We have the following issues: - we see CorruptSSTableException in system_auth.users table - we are not able to add users after delete, which can be explained by the following statement found in the source code: INSERT INTO %s.%s (username, salted_hash) VALUES ('%s', '%s') USING TIMESTAMP 0 (see the 0 - is this really correct?) nodetool scrub didn't help, compactation didn't help - tombstones were still there, as well as the exception. Has anybody else seen this? It's cassandra 1.2.11 with vnodes on. regards, ondrej cernos
Re: CorruptSSTableException in system_auth keyspace
Sorry, I sent the mail too early. This is the stack trace: 2014-02-28 10:56:03.205+0100 [SSTableBatchOpen:1] [ERROR] DebuggableThrea dPoolExecutor.java(218) org.apache.cassandra.concurrent.DebuggableThr eadPoolExecutor: Error in ThreadPoolExecutor org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFExce ption at org.apache.cassandra.io.compress.CompressionMetadata.init( CompressionMetadata.java:108) at org.apache.cassandra.io.compress.CompressionMetadata.create( CompressionMetadata.java:63) at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$ Builder.complete(CompressedPoolingSegmentedFile.java:42) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableRe ader.java:407) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableRe ader.java:198) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableRe ader.java:157) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableR eader.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executor s.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.ja va:340) at java.io.DataInputStream.readUTF(DataInputStream.java:589) at java.io.DataInputStream.readUTF(DataInputStream.java:564) at org.apache.cassandra.io.compress.CompressionMetadata.init( CompressionMetadata.java:83) ... 11 more Snappy is used for compression on this table. ondrej c. On Fri, Feb 28, 2014 at 11:09 AM, Ondřej Černoš cern...@gmail.com wrote: Hello, we are trying to add authentication to our Cassandra cluster. We add our authenticated users during puppet deployment using the default user, which is then disabled. We have the following issues: - we see CorruptSSTableException in system_auth.users table - we are not able to add users after delete, which can be explained by the following statement found in the source code: INSERT INTO %s.%s (username, salted_hash) VALUES ('%s', '%s') USING TIMESTAMP 0 (see the 0 - is this really correct?) nodetool scrub didn't help, compactation didn't help - tombstones were still there, as well as the exception. Has anybody else seen this? It's cassandra 1.2.11 with vnodes on. regards, ondrej cernos
Re: Intermittent long application pauses on nodes
Hi all, we are seeing the same kind of long pauses in Cassandra. We tried to switch CMS to G1 without positive result. The stress test is read heavy, 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in latency on 99.99 percentil and higher, caused by threads being stopped in JVM. The GC in G1 looks like this: {Heap before GC invocations=4073 (full 1): garbage-first heap total 8388608K, used 3602914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 142 young (581632K), 11 survivors (45056K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 37748736 bytes, new threshold 15 (max 15) - age 1: 17213632 bytes, 17213632 total - age 2: 19391208 bytes, 36604840 total , 0.1664300 secs] [Parallel Time: 163.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max: 222346218.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum: 13.7] [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6] [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120] [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5] [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: 0.1, Sum: 224.6] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 163.8, Avg: 163.8, Max: 163.8, Diff: 0.0, Sum: 327.6] [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max: 222346382.1, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 0.4 ms] [Other: 2.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.4 ms] [Eden: 524.0M(524.0M)-0.0B(476.0M) Survivors: 44.0M-68.0M Heap: 3518.5M(8192.0M)-3018.5M(8192.0M)] Heap after GC invocations=4074 (full 1): garbage-first heap total 8388608K, used 3090914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 17 young (69632K), 17 survivors (69632K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. } [Times: user=0.35 sys=0.00, real=27.58 secs] 222346.219: G1IncCollectionPause [ 111 0 0] [ 0 0 0 0 27586] 0 And the total thime for which application threads were stopped is 27.58 seconds. CMS behaves in a similar manner. We thought it would be GC, waiting for mmaped files being read from disk (the thread cannot reach safepoint during this operation), but it doesn't explain the huge time. We'll try jhiccup to see if it provides any additional information. The test was done on mixed aws/openstack environment, openjdk 1.7.0_45, cassandra 1.2.11. Upgrading to 2.0.x is no option for us. regards, ondrej cernos On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng fnt...@gmail.com wrote: Sorry, I have not had a chance to file a JIRA ticket. We have not been able to resolve the issue. But since Joel mentioned that upgrading to Cassandra 2.0.X solved it for them, we may need to upgrade. We are currently on Java 1.7 and Cassandra 1.2.8 On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.comwrote: You're running 2.0.* in production? May I ask what C* version and OS? Any hardware details would be appreciated as well. Thx! From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, February 13, 2014 at 11:39 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems to have helped our issues. 2014-02-13 Keith Wright kwri...@nanigans.com: Frank did you ever file a ticket for this issue or find the root cause? I believe we are seeing the same issues when attempting to bootstrap. Thanks From: Robert Coli rc...@eventbrite.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, February 3, 2014 at 6:10 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: It's possible that this is a JVM issue, but if so there may be some remedial action we can take anyway. There are some more flags we should add, but we can discuss that once you open a
Re: Intermittent long application pauses on nodes
Hi, we tried to switch to G1 because we observed this behaviour on CMS too (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS were not easily traceable - JVM stopped even without stop-the-world pause scheduled (defragmentation, remarking). We thought the go-to-safepoint waiting time might have been involved (we saw waiting for safepoint resolution) - especially because access to mmpaped files is not preemptive, afaik, but it doesn't explain tens of seconds waiting times, even slow IO should read our sstables into memory in much less time. We switched to G1 out of desperation - and to try different code paths - not that we'd thought it was a great idea. So I think we were hit by the problem discussed in this thread, just the G1 report wasn't very clear, sorry. regards, ondrej On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Ondrej, It seems like your issue is much less difficult to diagnose: your collection times are long. At least, the pause you printed the time for is all attributable to the G1 pause. Note that G1 has not generally performed well with Cassandra in our testing. There are a number of changes going in soon that may change that, but for the time being it is advisable to stick with CMS. With tuning you can no doubt bring your pauses down considerably. On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are seeing the same kind of long pauses in Cassandra. We tried to switch CMS to G1 without positive result. The stress test is read heavy, 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in latency on 99.99 percentil and higher, caused by threads being stopped in JVM. The GC in G1 looks like this: {Heap before GC invocations=4073 (full 1): garbage-first heap total 8388608K, used 3602914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 142 young (581632K), 11 survivors (45056K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 37748736 bytes, new threshold 15 (max 15) - age 1: 17213632 bytes, 17213632 total - age 2: 19391208 bytes, 36604840 total , 0.1664300 secs] [Parallel Time: 163.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max: 222346218.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum: 13.7] [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6] [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120] [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5] [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: 0.1, Sum: 224.6] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 163.8, Avg: 163.8, Max: 163.8, Diff: 0.0, Sum: 327.6] [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max: 222346382.1, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 0.4 ms] [Other: 2.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.4 ms] [Eden: 524.0M(524.0M)-0.0B(476.0M) Survivors: 44.0M-68.0M Heap: 3518.5M(8192.0M)-3018.5M(8192.0M)] Heap after GC invocations=4074 (full 1): garbage-first heap total 8388608K, used 3090914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 17 young (69632K), 17 survivors (69632K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. } [Times: user=0.35 sys=0.00, real=27.58 secs] 222346.219: G1IncCollectionPause [ 111 0 0] [ 0 0 0 0 27586] 0 And the total thime for which application threads were stopped is 27.58 seconds. CMS behaves in a similar manner. We thought it would be GC, waiting for mmaped files being read from disk (the thread cannot reach safepoint during this operation), but it doesn't explain the huge time. We'll try jhiccup to see if it provides any additional information. The test was done on mixed aws/openstack environment, openjdk 1.7.0_45, cassandra 1.2.11. Upgrading to 2.0.x is no option for us. regards, ondrej cernos On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng fnt...@gmail.com wrote: Sorry, I have not had a chance to file a JIRA ticket. We have not been able to resolve
exceptions all around in clean cluster
Hi, I am running a small 2 DC cluster of 3 nodes (each DC). I use 3 replicas in both DCs (all 6 nodes have everything) on Cassandra 1.2.11. I populated the cluster via cqlsh pipelined with a series of inserts. I use the cluster for tests, the dataset is pretty small (hundreds of thousands of records max). The cluster was completely up during inserts. Inserts were done serially on one of the nodes. The resulting load is uneven: Datacenter: xxx == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN ip 1.63 GB256 100.0% 83ecd32a-3f2b-4cf6-b3c7-b316cb1986cc default-rackUN ip 1.5 GB 256 100.0%091ca530-2e95-4954-92c4-76f51fab0b66 default-rack UN ip 1.44 GB256 100.0% d94d335e-08bf-4a30-ad58-4c5acdc2ef45 default-rack Datacenter: yyy === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN ip 2.27 GB256 100.0% e2584981-71f7-45b0-82f4-e08942c47585 1c UN ip 2.27 GB256 100.0% e5c6de9a-819e-4757-a420-55ec3ffaf131 1c UN ip 2.27 GB256 100.0% fa53f391-2dd3-4ec8-885d-8db6d453a708 1c And 4 out of 6 nodes report corrupted sstables: java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1618) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested at org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:119) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:44) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:104) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1062) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1614) ... 3 more Caused by: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested at org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:135) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73) at org.apache.cassandra.db.columniterator.IndexedSliceReader$SimpleBlockFetcher.init(IndexedSliceReader.java:477) at org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:94) repair -pr hangs, rebuild from the less corrupted dc hangs. The only interesting exception (besides the java.io.EOFException during repair) is the following: org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 52f2665b at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:54) at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:102) at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:448) at org.apache.cassandra.db.Table.indexRow(Table.java:431) at
Re: exceptions all around in clean cluster
) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:61) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:182) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143) at org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:86) at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123) at org.apache.cassandra.db.SliceQueryPager.next(SliceQueryPager.java:57) at org.apache.cassandra.db.Table.indexRow(Table.java:424) at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:803) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Should I file a bug report with all this? regards, ondrej cernos On Thu, Feb 6, 2014 at 2:38 PM, Ondřej Černoš cern...@gmail.com wrote: Hi, I am running a small 2 DC cluster of 3 nodes (each DC). I use 3 replicas in both DCs (all 6 nodes have everything) on Cassandra 1.2.11. I populated the cluster via cqlsh pipelined with a series of inserts. I use the cluster for tests, the dataset is pretty small (hundreds of thousands of records max). The cluster was completely up during inserts. Inserts were done serially on one of the nodes. The resulting load is uneven: Datacenter: xxx == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN ip 1.63 GB256 100.0% 83ecd32a-3f2b-4cf6-b3c7-b316cb1986cc default-rackUN ip 1.5 GB 256 100.0%091ca530-2e95-4954-92c4-76f51fab0b66 default-rack UN ip 1.44 GB256 100.0% d94d335e-08bf-4a30-ad58-4c5acdc2ef45 default-rack Datacenter: yyy === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN ip 2.27 GB256 100.0% e2584981-71f7-45b0-82f4-e08942c47585 1c UN ip 2.27 GB256 100.0% e5c6de9a-819e-4757-a420-55ec3ffaf131 1c UN ip 2.27 GB256 100.0% fa53f391-2dd3-4ec8-885d-8db6d453a708 1c And 4 out of 6 nodes report corrupted sstables: java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1618) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested at org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:119) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:44) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:104) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391
Re: exceptions all around in clean cluster
Update: I dropped the keyspace, the system keyspace, deleted all the data and started from fresh state. Now it behaves correctly. The previously reported state is therefore the result of the keyspace being dropped beforehand and recreated with no compression on sstables - maybe some sstables were left in system keyspace as live though the keyspace was completely dropped? ondrej cernos On Thu, Feb 6, 2014 at 3:11 PM, Ondřej Černoš cern...@gmail.com wrote: I ran nodetool scrub on nodes in the less corrupted datacenter and tried nodetool rebuild from this datacenter. This is the result: 2014-02-06 15:04:24.645+0100 [Thread-83] [ERROR] CassandraDaemon.java(191) org.apache.cassandra.service.CassandraDaemon: Exception in thread Thread[Thread-83,5,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:152) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:187) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:138) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:243) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:183) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:79) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:144) ... 5 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:132) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:115) at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:165) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:45) at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:61) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:182) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143) at org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:86) at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123) at org.apache.cassandra.db.SliceQueryPager.next(SliceQueryPager.java:57) at org.apache.cassandra.db.Table.indexRow(Table.java:424) at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:803) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 2014-02-06 15:04:24.646+0100 [CompactionExecutor:10] [ERROR] CassandraDaemon.java(191) org.apache.cassandra.service.CassandraDaemon: component=c4 Exception in thread Thread[CompactionExecutor:10,1,main] java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength
Re: question about secondary index or not
Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: various Cassandra performance problems when CQL3 is really used
Hi, by the way, some of the issues are summarised here: https://issues.apache.org/jira/browse/CASSANDRA-6586 and here: https://issues.apache.org/jira/browse/CASSANDRA-6587. regards, ondrej cernos On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš cern...@gmail.com wrote: Hi, thanks for the answer and sorry for the delay. Let me answer inline. On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton aa...@thelastpickle.comwrote: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. Yes, this is why you have to add allow_filtering. You are asking the nodes to read all the data that matches and filter in memory, that’s a SQL type operation. Your example query is somewhat complex and I doubt it could get decent performance, what does the query plan look like? I don't know. How do I find out? The only mention about query plan in Cassandra I found is your article on your site, from 2011 and considering version 0.8. The example query gets computed in a fraction of the time if I perform just the fetch of all rows matching the token function and perform the filtering client side. IMHO you need to do further de-normalisation, you will get the best performance when you select rows by their full or part primary key. I denormalize all the way I can. The problem is I need to support paging and filtering at the same time. The API I must support allows filtering by example and paging - so how should I denormalize? Should I somehow manage pages of primary row keys manually? Or should I have manual secondary index and page somehow in the denormalized wide row? The trouble goes even further, even this doesn't perform well: select id from table where token(id) token(some_value) and pk_cluster = 'val' limit N; where id and pk_cluster are primary key (CQL3 table). I guess this should be ordered row query and ordered column slice query, so where is the problem with performance? By the way, the performance is order of magnitude better if this patch is applied: That looks like it’s tuned to your specific need, it would ignore the max results included in the query It is tuned, it only demonstrates the heuristics doesn't work well. * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. Select all rows from a table requires a range scan, which reads all rows from all nodes. It should never be used production. The trouble is I just need to perform it, sometimes. I know what the problem with the query is, but I have just a couple of thousands records - 150.000 - the datasets can all be stored in memory, SSTables can be fully mmapped. There is no reason for this query to be slow in this case. Not sure what you mean by “scans all columns from all rows” a select by column name will use a SliceByNamesReadCommand which will only read the required columns from each SSTable (it normally short circuits though and read from less). The query should fetch only IDs, it checks TTLs of columns though. That is the point. Why does it do it? if there is a TTL the ExpiringColumn.localExpirationTime must be checked, if there is no TTL it will no be checked. It is a standard CQL3 table with ID, couple of columns and a CQL3 collection. I didn't do anything with TTL on the table and it's columns. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. This is not true, could you provide an example where you think this is happening ? We saw it in the trace log. It happened in the select ID from table query. The table had a collection column. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. What was the HW platform and what was the load ? My I7/8GB notebook, single node cluster, and virtualised AWS like environment, on nodes of various sizes. Typically freezes in the server correlate to JVM GC, the JVM GC can also be using the CPU. If you have wide rows or make large reads you may run into more JVM GC issues. nodetool flush will (as it says) flush all the tables to disk, if you have a lot tables and/or a lot of secondary indexes this can cause the switch lock to be held
Re: various Cassandra performance problems when CQL3 is really used
performance problems I would guess it is related to JVM GC and/or the disk IO is not able to keep up. When used it creates a single SSTable for each table which will not be compacted again until (default) 3 other large SSTables are created or you run major compaction again. For this reason it is not recommended. Conclusions: - do not use collections - do not use secondary indexes - do not use filtering - have your rows as narrow as possible if you need any kind of all row keys traversal These features all have a use, but it looks like you leaned on them heavily while creating a relational model. Specially the filtering, you have to explicitly enable it to prevent the client sending queries that will take a long time. The only time row key traversal is used normally is reading data through hadoop. You should always strive to read row(s) from a table by the full or partial primary key. With these conclusions in mind, CQL seems redundant, plain old thrift may be used, joins should be done client side and/or all indexes need to be handled manually. Correct? No. CQL provide a set of functionality not present in the thrift API. Joins and indexes should generally be handled by denormlaising the data during writes. It sounds like your data model was too relational, you need to denormalise and read rows by primary key. Secondary indexes are useful when you have a query pattern that is used infrequently. regards, ondrej cernos Hope that helps. - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 18/12/2013, at 3:47 am, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are reimplementing a legacy interface of an inventory-like service (currently built on top of mysql) on Cassandra and I thought I would share some findings with the list. The interface semantics is given and cannot be changed. We chose Cassandra due to its multiple datacenter capabilities and no-spof qualities. The dataset is small (6 tables having 150.000 records, a bunch of tables with up to thousands of records), but the model is not trivial - the mysql model has some 20+ tables, joins are frequent, m:n relationships are frequent and the like. The interface is read heavy. We thought the size of the dataset should allow the whole dataset to fit into memory of each node (3 node cluster in each DC, 3 replicas, local quorum operations) and that even though some operations (like secondary index lookup) are not superfast, due to the size of the dataset it should perform ok. We were wrong. We use CQL3 exclusively and we use all of its capabilities (collections, secondary indexes, filtering), because they make the data model maintainable. We denormalised what had to be denormalised in order to avoid client side joins. Usual query to the storage means one CQL query on a denormalised table. We need to support integer offset/limit paging, filter-by-example kind of queries, M:N relationship queries and all the usual suspects of old SQL-backed interface. This is the list of operations that perform really poorly we identified so far. Row id is called id in the following: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. By the way, the performance is order of magnitude better if this patch is applied: diff --git a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java index 5ab1df6..13af671 100644 --- a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java +++ b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java @@ -190,7 +190,8 @@ public class CompositesSearcher extends SecondaryIndexSearcher private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), 1); // We shouldn't fetch only 1 row as this provides buggy paging in case the first row doesn't satisfy all clauses -private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +//private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +private final int rowsPerQuery = 10; public boolean needsFiltering() { * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds
various Cassandra performance problems when CQL3 is really used
Hi all, we are reimplementing a legacy interface of an inventory-like service (currently built on top of mysql) on Cassandra and I thought I would share some findings with the list. The interface semantics is given and cannot be changed. We chose Cassandra due to its multiple datacenter capabilities and no-spof qualities. The dataset is small (6 tables having 150.000 records, a bunch of tables with up to thousands of records), but the model is not trivial - the mysql model has some 20+ tables, joins are frequent, m:n relationships are frequent and the like. The interface is read heavy. We thought the size of the dataset should allow the whole dataset to fit into memory of each node (3 node cluster in each DC, 3 replicas, local quorum operations) and that even though some operations (like secondary index lookup) are not superfast, due to the size of the dataset it should perform ok. We were wrong. We use CQL3 exclusively and we use all of its capabilities (collections, secondary indexes, filtering), because they make the data model maintainable. We denormalised what had to be denormalised in order to avoid client side joins. Usual query to the storage means one CQL query on a denormalised table. We need to support integer offset/limit paging, filter-by-example kind of queries, M:N relationship queries and all the usual suspects of old SQL-backed interface. This is the list of operations that perform really poorly we identified so far. Row id is called id in the following: * select id from table where token(id) token(some_value) and secondary_index = other_val limit 2 allow filtering; Filtering absolutely kills the performance. On a table populated with 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of JVM heap) and secondary index built on column with low cardinality of its value set this query takes 156 seconds to finish. By the way, the performance is order of magnitude better if this patch is applied: diff --git a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java index 5ab1df6..13af671 100644 --- a/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java +++ b/src/java/org/apache/cassandra/db/index/composites/CompositesSearcher.java @@ -190,7 +190,8 @@ public class CompositesSearcher extends SecondaryIndexSearcher private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), 1); // We shouldn't fetch only 1 row as this provides buggy paging in case the first row doesn't satisfy all clauses -private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +//private final int rowsPerQuery = Math.max(Math.min(filter.maxRows(), filter.maxColumns() / meanColumns), 2); +private final int rowsPerQuery = 10; public boolean needsFiltering() { * select id from table; As we saw in the trace log, the query - although it queries just row ids - scans all columns of all the rows and (probably) compares TTL with current time (?) (we saw hundreds of thousands of gettimeofday(2)). This means that if the table somehow mixes wide and narrow rows, the performance suffers horribly. * CQL collections See the point above with mixing wide rows and narrow rows. As Cassandra checks all the columns in selects, performance suffers badly if the collection is of any interesting size. Additionally, we saw various random irreproducible freezes, high CPU consumption when nothing happens (even with trace log level set no activity was reported) and highly inpredictable performance characteristics after nodetool flush and/or major compaction. Conclusions: - do not use collections - do not use secondary indexes - do not use filtering - have your rows as narrow as possible if you need any kind of all row keys traversal With these conclusions in mind, CQL seems redundant, plain old thrift may be used, joins should be done client side and/or all indexes need to be handled manually. Correct? Thanks for reading, ondrej cernos
Cassandra and Pig - CQL maps denormalisation
Hi all, I am solving a issue with pig integration with cassandra using CqlLoader. I don't know exactly if the problem is in CqlLoader, my low understanding of Pig (I hope this is actually the case) or some bug in the combination of Pig and CqlLoader. Sorry if this turns out to be rather a Pig question and not a Cassandra one. I have a table using cql maps: CREATE TABLE test ( name text PRIMARY KEY, sources maptext, text ) I need to denormalise the map in order to perform some sanitary checks on the rest of the DB (outer join using values from the map with another tables in cassandra keyspace). I want to create triples containing table key, map key and map value for further joining. The size of the map is anything between null and tens of records. The table test itself is pretty small. This is what I do: grunt data = LOAD 'cql://keyspace/test' USING CqlStorage(); grunt describe data; data: {name: chararray,sources: ()} grunt data1 = filter data by sources is not null; grunt dump data1; (name1,((k1,s1),(k2,s2))) grunt data2 = foreach data1 generate name, flatten(sources); grunt dump data2; (name1,(k1,s1),(k2,s2)) grunt describe data2; Schema for data2 unknown. grunt data3 = FOREACH data2 generate $0 as name, FLATTEN(TOBAG($1..$100)); // I know there will be max tens of records in the map grunt dump data3; (name1,k1,s1) (name1,k2,s2) (name1,) (name1,) ... 95 more lines here ... grunt data4 = FILTER data3 BY $1 IS NOT null; grunt dump data4; (name1,k1,s1) (name1,k2,s2) grunt describe data4; data4: {name: bytearray,bytearray} grunt data5 = foreach data4 generate $0, $1; grunt dump data5; (name1,k1) (name1,k2) grunt p = foreach data4 generate $0, $2; Details at logfile: //pig_xxx.log From the log file: Pig Stack Trace --- ERROR 1000: line 28, column 33 Out of bound access. Trying to access non-existent column: 2. Schema name:bytearray,:bytearray has 2 column(s). org.apache.pig.impl.plan.PlanValidationException: ERROR 1000: line 28, column 33 Out of bound access. Trying to access non-existent column: 2. Schema name:bytearray,:bytearray has 2 column(s). at org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:197) at org.apache.pig.newplan.logical.expression.ProjectExpression.setColumnNumberFromAlias(ProjectExpression.java:174) Considering the schema - no surprise. What is strange is the fact I see the map values in dump (see dump data4), but I have no way to get them using pig latin. I tried to simulate the situation using PigStorage loader. This is the best I got (not exactly the same, but roughly): grunt data = load 'test.csv' using PigStorage(','); grunt dump data; (key1,mk1,mv1,mk2,mv2) (key2) (key3,mk1,mv3,mk2,mv4) grunt data1 = foreach data generate $0, TOTUPLE($1, $2), TOTUPLE($3, $4); grunt dump data1; (key1,(mk1,mv1),(mk2,mv2)) (key2,(,),(,)) (key3,(mk1,mv3),(mk2,mv4)) grunt data2 = FOREACH data1 generate $0 as name, FLATTEN(TOBAG($1..$2)); grunt dump data2; (key1,mk1,mv1) (key1,mk2,mv2) (key2,,) (key2,,) (key3,mk1,mv3) (key3,mk2,mv4) grunt describe data2; data2: {name: bytearray,bytearray,bytearray} Which is exactly what I need. The only problem is this simulation doesn't allow me to specify the arbitrary high value in the FLATTEN(TOBAG()) call - I need to know in advance what is the size of the row. Questions: - is this the correct way to denormalize the data? This is a pig question, but maybe someone will know (I am a pig newbie). - couln't there be a problem with internal data representation returned from CqlStorage? See the difference between data loaded from file and these loaded from cassandra. Versions: cassandra 1.2.11, Pig 0.12. Thanks in advance, Ondrej Cernos
Re: Can't perform repair on a 1.1.5 cassandra node -SSTable corrupted
Hello, please see these issues: https://issues.apache.org/jira/browse/CASSANDRA-5686 and https://issues.apache.org/jira/browse/CASSANDRA-5391 if you hit any of them. regards, ondrej cernos On Wed, Aug 7, 2013 at 5:00 PM, Madalina Matei madalinaima...@gmail.comwrote: Hi, I have a 5 nodes cassandra (version 1.1.5) ring, RF=2, CL- READ/Write =1. After a node went down without any error reported in OS syslog or Cassandra syslog i decided to perform a repair. Each time i run a nodetool repair I get this error: INFO [FlushWriter:5] 2013-08-07 11:09:26,770 Memtable.java (line 305) Completed flushing /data/data-298-Data.db (18694 bytes) for commitlog position ReplayPosition(segmentId=1375867548785, position=199) ERROR [Thrift:286] 2013-08-07 11:10:04,448 CustomTThreadPoolServer.java (line 204) Error occurred during processing of message. java.lang.RuntimeException: error reading 1 of 1 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:107) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:80) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1381) at org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1377) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1454) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433) at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:870) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:91) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:77) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398)
pagination in cql3
Hi all, I need to support a legacy API where page offset and limit are on the input of the API call (it used to be mapped directly to offset and limit MySQL select options). The data are pretty small (like really small, some hundreds of thousands narrow rows maximum - I use Cassandra for its multiple-dc and HA capabilities, not for big data). I know the token(key) function and its use for paging, but unfortunately I cannot change the API to a version where last key on previous page and limit would be provided. What I thought I would do - though it is violating good Cassandra practices like don't fetch all keys - is the following: select _key_ from table limit _offset_value_; select _columns_ from table where token(_key_) token(_last_key_from_the_select_above_); The first select tells me where the offset begins and the second one queries for the page. The paged queries will not be performed too often, so performance is not such a big deal here. This construct however depends on repeatable ordering of keys returned from the select key from table query. I don't care about the ordering, but I need to know it is actually ordered by key tokens. Afaik it should be so (SSTs are ordered this way, the coordinator merges the data from queried nodes, ssts and memtables - I suppose it all preserves the order), but I don't know if it really works this way and if it is documented so that I can rely on it. Or should it be done some other way? Thanks, Ondrej Cernos
Re: Secondary Index on table with a lot of data crashes Cassandra
Hi, this is true for CQL2, it doesn't work for CQL3: cqlsh:c4 SELECT id from some_table WHERE indexed_column='test'; ... cqlsh:c4 SELECT KEY from some_table WHERE indexed_column='test'; Bad Request: Undefined name key in selection clause Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. regards, Ondřej Černoš On Thu, Apr 25, 2013 at 10:32 AM, moshe.kr...@barclays.com wrote: IMHO: user_name is not a column, it is the row key. Therefore, according to http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ , the row does not contain a relevant column index, which causes the iterator to read each column (including value) of each row. ** ** I believe that instead of referring to user_name as if it were a column, you need to refer to it via the reserved word “KEY”, e.g.: ** ** Select KEY from users where status = 2; ** ** Always glad to share a theory with a friend…. ** ** ** ** *From:* Tamar Rosen [mailto:ta...@correlor.com] *Sent:* Thursday, April 25, 2013 11:04 AM *To:* user@cassandra.apache.org *Subject:* Secondary Index on table with a lot of data crashes Cassandra** ** ** ** Hi, ** ** We have a case of a reproducible crash, probably due to out of memory, but I don't understand why. ** ** The installation is currently single node. ** ** We have a column family with approx 5 rows. ** ** In cql, the CF definition is: ** ** ** ** CREATE TABLE users ( user_name text PRIMARY KEY, big_json text, status int ); ** ** Each big_json can have 500K or more of data. ** ** There is also a secondary index on the status column. Status can have various values, over 90% of all rows have status = 2. ** ** ** ** Calling: ** ** Select user_name from users limit 8; ** ** Is pretty fast ** ** ** ** ** ** Calling: ** ** Select user_name from users where status = 1; is slower, even though much less data is returned. ** ** Calling: ** ** Select user_name from users where status = 2; ** ** Always crashes. ** ** ** ** What are we doing wrong? Can it be that Cassandra is actually trying to read all the CF data rather than just the keys! (actually, it doesn't need to go to the users CF at all - all the data it needs is in the index CF) ** ** ** ** Also, in the code I am doing the same using Astyanax index query with pagination, and the behavior is the same. Please help me: ** ** 1. solve the immediate issue ** ** 2. understand if there is something in this use case which indicates that we are not using Cassandra the way it is meant. ** ** Thanks, ** ** Tamar Rosen ** ** Correlor.com ** ** ** ** ___ This message may contain information that is confidential or privileged. If you are not an intended recipient of this message, please delete it and any attachments, and notify the sender that you have received it in error. Unless specifically stated in the message or otherwise indicated, you may not duplicate, redistribute or forward this message or any portion thereof, including any attachments, by any means to any other person, including any retail investor or customer. This message is not a recommendation, advice, offer or solicitation, to buy/sell any product or service, and is not an official confirmation of any transaction. Any opinions presented are solely those of the author and do not necessarily represent those of Barclays. This message is subject to terms available at: www.barclays.com/emaildisclaimer and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays group. ___
Re: Secondary Index on table with a lot of data crashes Cassandra
Hi, if you are able to reproduce the issue, file a ticket on https://issues.apache.org/jira/browse/CASSANDRA - my experience is developers respond quickly on issues that are clearly a bug. regards, ondrej cernos On Thu, Apr 25, 2013 at 10:03 AM, Tamar Rosen ta...@correlor.com wrote: Hi, We have a case of a reproducible crash, probably due to out of memory, but I don't understand why. The installation is currently single node. We have a column family with approx 5 rows. In cql, the CF definition is: CREATE TABLE users ( user_name text PRIMARY KEY, big_json text, status int); Each big_json can have 500K or more of data. There is also a secondary index on the status column. Status can have various values, over 90% of all rows have status = 2. Calling: Select user_name from users limit 8; Is pretty fast Calling: Select user_name from users where status = 1; is slower, even though much less data is returned. Calling: Select user_name from users where status = 2; Always crashes. What are we doing wrong? Can it be that Cassandra is actually trying to read all the CF data rather than just the keys! (actually, it doesn't need to go to the users CF at all - all the data it needs is in the index CF) Also, in the code I am doing the same using Astyanax index query with pagination, and the behavior is the same. Please help me: 1. solve the immediate issue 2. understand if there is something in this use case which indicates that we are not using Cassandra the way it is meant. Thanks, Tamar Rosen Correlor.com
Re: Repair Freeze / Gossip Invisibility / EC2 Public IP configuration
Hi, I have similar issue with stuck repair. Similar multiregion setup, only between us-east and private cloud at rackspace. The log mentiones merkle tree exchanges and I see a lot of dropped communication: I will comment on your ticket in Jira. regards, ondrej cernos On Fri, Apr 19, 2013 at 4:50 AM, Arya Goudarzi gouda...@gmail.com wrote: We don't use default ports. Woops! Now I advertised mine. I did try disabling internode compression for all in cassandra.yaml but still it did not work. I have to open the insecure storage port to public ips. On Tue, Apr 16, 2013 at 4:59 PM, Edward Capriolo edlinuxg...@gmail.comwrote: So cassandra does inter node compression. I have not checked but this might be accidentally getting turned on by default. Because the storage port is typically 7000. Not sure why you are allowing 7100. In any case try allowing 7000 or with internode compression off. On Tue, Apr 16, 2013 at 6:42 PM, Arya Goudarzi gouda...@gmail.comwrote: TL;DR; An EC2 Multi-Region Setup's Repair/Gossip Works with 1.1.10 but with 1.2.4, gossip does not see the nodes after restarting all nodes at once, and repair gets stuck. This is a working configuration: Cassandra 1.1.10 Cluster with 12 nodes in us-east-1 and 12 nodes in us-west-2 Using Ec2MultiRegionSnitch and SSL enabled for DC_ONLY and NetworkTopologyStrategy with strategy_options: us-east-1:3;us-west-2:3; C* instances have a security group called 'cluster1' security group 'cluster1' in each region is configured as such Allow TCP: 7199 from cluster1 (JMX) 1024 - 65535 from cluster1 (JMX Random Ports - This supersedes all specific ports, but I have the specific ports just for clarity ) 7100 from cluster1 (Configured Normal Storage) 7103 from cluster1 (Configured SSL Storage) 9160 from cluster1 (Configured Thrift RPC Port) 9160 from client_group foreach node's public IP we also have this rule set to enable cross region comminication: 7103 from public_ip (Open SSL storage) The above is a functioning and happy setup. You run repair, and it finishes successfully. Broken Setup: Upgrade to 1.2.4 without changing any of the above security group settings: Run repair. The repair will get stuck. Thus hanging. Now for each public_ip add a security group rule as such to cluster1 security group: Allow TCP: 7100 from public_ip Run repair. Things will work now. Also after restarting all nodes at the same time, gossip will see everyone again. I was told on https://issues.apache.org/jira/browse/CASSANDRA-5432 that nothing in terms of networking was changed. If nothing in terms of port and networking was changed in 1.2, then why the above is happening? I can constantly reproduce it. Please advice. -Arya
Plans for CQL3 (non-compact storage) table support in Cassandra's Pig support
Hi all, is there someone on this list knowledgable enough about the plans for support on non-compact storage tables ( https://issues.apache.org/jira/browse/CASSANDRA-5234) in Cassandra's Pig support? Currently Pig cannot be used with Cassandra 1.2 and CQL3-only tables and this hurts a lot (I found blog posts about this problem, a stackoverflow question and the related https://issues.apache.org/jira/browse/CASSANDRA-4421 issue has quite a lot of watchers and voters). I need to make a decision about our future development efforts and knowing whether this issue is on the road map or not would help. regards, ondřej černoš
Compaction, truncate, cqlsh problems
Hi, I use C* 1.2.3 and CQL3. I integrated cassandra into our testing environment. In order to make the tests repeatable I truncate all the tables that need to be empty before the test run via ssh session to the host cassandra runs on and by running cqlsh where I issue the truncate. It works, only sometimes it silently fails (1 in 400 runs of the truncate, actually). At the same time the truncate fails I see system ks compaction. Additionally, it seems there is quite a lot of these system ks compactions (the numbers in the filenames go up pretty fast to thousands). I googled truncate and found out there were some issues with race conditions and with slowing down if truncate is used frequently (as is my case, where truncate is run before each test in quite a big test suite). Any hints? Regards, Ondřej Černoš
Re: Compaction, truncate, cqlsh problems
Hi, I have JNA (cassandra only complains about obsolete version - Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later - I have stock centos version 3.2.4). Usage of separate CFs for each test run is difficult to set up. Can you please elaborate on the specials of truncate? regards, Ondřej Černoš On Thu, Apr 11, 2013 at 5:04 PM, Edward Capriolo edlinuxg...@gmail.comwrote: If you do not have JNA truncate has to fork an 'ln -s'' command for the snapshots. I think that makes it un-predicatable. Truncate has its own timeout value now (separate from the other timeouts). If possible I think it is better to make each test use it's own CF and avoid truncate entirely. On Thu, Apr 11, 2013 at 9:48 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, I use C* 1.2.3 and CQL3. I integrated cassandra into our testing environment. In order to make the tests repeatable I truncate all the tables that need to be empty before the test run via ssh session to the host cassandra runs on and by running cqlsh where I issue the truncate. It works, only sometimes it silently fails (1 in 400 runs of the truncate, actually). At the same time the truncate fails I see system ks compaction. Additionally, it seems there is quite a lot of these system ks compactions (the numbers in the filenames go up pretty fast to thousands). I googled truncate and found out there were some issues with race conditions and with slowing down if truncate is used frequently (as is my case, where truncate is run before each test in quite a big test suite). Any hints? Regards, Ondřej Černoš
Re: nodetool status inconsistencies, repair performance and system keyspace compactions
Hi, most has been resolved - the failed to uncompress error was really a bug in cassandra (see https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem with different load reporting is a change between 1.2.1 (reports 100% for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the fraction. Is this correct? Anyway, the nodetool repair still takes ages to finish, considering only megabytes of not changing data are involved in my test: [root@host:/etc/puppet] nodetool repair ks [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 ranges for keyspace ks [2013-04-04 13:47:17,007] Repair session 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range (-2270395505556181001,-2268004533044804266] finished ... [2013-04-04 13:47:17,063] Repair session 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range (1069254279177813908,1070290707448386360] finished [2013-04-04 13:47:17,063] Repair command #1 finished This is the status before the repair (by the way, after the datacenter has been bootstrapped from the remote one): [root@host:/etc/puppet] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx5.74 MB256 17.1% 06ff8328-32a3-4196-a31f-1e0f608d0638 1d UN xxx.xxx.xxx.xxx5.73 MB256 15.3% 7a96bf16-e268-433a-9912-a0cf1668184e 1d UN xxx.xxx.xxx.xxx5.72 MB256 17.5% 67a68a2a-12a8-459d-9d18-221426646e84 1d Datacenter: na-dev == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx 5.74 MB256 16.4% eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 UN xxx.xxx.xxx.xxx 5.74 MB256 17.0% cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 UN xxx.xxx.xxx.xxx 5.74 MB256 16.7% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 Why does it take 20 minutes to finish? Fortunately the big number of compactions I reported in the previous email was not triggered. And is there a documentation where I could find the exact semantics of repair when vnodes are used (and what -pr means in such a setup) and when run in multiple datacenter setup? I still don't quite get it. regards, Ondřej Černoš On Thu, Mar 28, 2013 at 3:30 AM, aaron morton aa...@thelastpickle.com wrote: During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html That thread has been updated, check the bug ondrej created. How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? Seems a little odd. See what happens the next time you run repair. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/03/2013, at 2:36 AM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX7.73 MB256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.73 MB256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN
Re: java.io.IOException: FAILED_TO_UNCOMPRESS(5) exception when running nodetool rebuild
Hi Aaron, I switched to 1.2.3 with no luck. I created https://issues.apache.org/jira/browse/CASSANDRA-5391 describing the problem. Maybe it's related to the EOFException problem, but I am not sure - I don't know Cassandra internals well and I have never seen the EOFException. regards, ondrej On Tue, Mar 26, 2013 at 9:26 PM, aaron morton aa...@thelastpickle.com wrote: If you are still on 1.2.1 may be this https://issues.apache.org/jira/browse/CASSANDRA-5105 Fixed in 1.2.2 If you are on 1.2.3 there is also https://issues.apache.org/jira/browse/CASSANDRA-5381 Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/03/2013, at 5:10 AM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I am still unable to move forward with this issue. - when I switch SSL off in inter-DC communication, nodetool rebuild works well - when I switch internode_compression off, I still get java.io.IOException: FAILED_TO_UNCOMPRESS exception. Does internode_compression: none really switch off the snappy compression of the internode communication? The stacktrace - see the previous mail - clearly demonstrates some compression is involved - I managed to trigger another exception: java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.lang.Thread.run(Thread.java:662) Caused by: javax.net.ssl.SSLException: bad record MAC at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755) at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 1 more I managed to trigger this exception only once however. The fact the transfer works when SSL is off and fails with SSL is another strange thing with this issue. Any ideas or hints? regards, Ondrej Cernos On Tue, Mar 19, 2013 at 5:51 PM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I am running into strange error when bootstrapping Cassandra cluster in multiple datacenter setup. The setup is as follows: 3 nodes in AWS east, 3 nodes somewhere on Rackspace/Openstack. I use my own snitch based on EC2MultiRegionSnitch (it just adds some ec2 avalability zone parsing capabilities). Nodes in the cluster connect to each other and all seems ok. When I start the Rackspace cluster first, populate it with data and then let the AWS cluster bootstrap from it, it works great. However the other way round it just breaks. The breakage demonstrates as follows: - nodetool rebuild us-east command hangs - cassandra's log contains the following: 2013-03-19 12:42:15.796+0100 [Thread-14] [DEBUG] IncomingTcpConnection.java(63) org.apache.cassandra.net.IncomingTcpConnection: Connection version 6 from ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/xxx.xxx.xxx.xxx 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-2-Data.db to Stream Request queue 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-1-Data.db to Stream Request queue 2013-03-19 12:42:15.806+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(112) org.apache.cassandra.streaming.IncomingStreamReader: Receiving stream 2013-03-19 12:42:15.807+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(113) org.apache.cassandra.streaming.IncomingStreamReader: Creating file for /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2-Data.db with 7808 estimat ed keys 2013-03-19 12:42:15.808+0100 [Thread-14] [DEBUG] ColumnFamilyStore.java(863) org.apache.cassandra.db.ColumnFamilyStore: component=key_space Checking for sstables overlapping [] 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Data.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-CompressionInfo.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110
nodetool status inconsistencies, repair performance and system keyspace compactions
Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes. Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host IDRack UN XXX.XXX.XXX.XXX7.73 MB256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.73 MB256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX7.73 MB256 17.6% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX7.75 MB256 16.4% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB256 15.7% f53fd294-16cc-497e-9613-347f07ac3850 1d Datacenter: na-prod === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX 7.74 MB256 16.9% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.72 MB256 17.1% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 UN XXX.XXX.XXX.XXX 7.73 MB256 16.3% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 I tried to rebuild the node from scratch, repair the node, no results. Still the same owns stats. The cluster is built from cassandra 1.2.3 and uses vnodes. On the related note: the data size, as you can see, is really small. The cluster was created by setting up the us-east datacenter, populating it with the dataset, then building the na-prod datacenter and running nodetool rebuild us-east. When I tried to run nodetool repair it took 25 minutes to finish, on this small dataset. Is this ok? One other think I notices is the amount of compactions on the system keyspace: /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db This is just after running the repair. Is this ok, considering the dataset is 7MB and during the repair no operations were running against the database, neither read, nor write, nothing? How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? regards, Ondrej Cernos
Re: java.io.IOException: FAILED_TO_UNCOMPRESS(5) exception when running nodetool rebuild
Hi all, I am still unable to move forward with this issue. - when I switch SSL off in inter-DC communication, nodetool rebuild works well - when I switch internode_compression off, I still get java.io.IOException: FAILED_TO_UNCOMPRESS exception. Does internode_compression: none really switch off the snappy compression of the internode communication? The stacktrace - see the previous mail - clearly demonstrates some compression is involved - I managed to trigger another exception: java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.lang.Thread.run(Thread.java:662) Caused by: javax.net.ssl.SSLException: bad record MAC at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755) at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75) at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 1 more I managed to trigger this exception only once however. The fact the transfer works when SSL is off and fails with SSL is another strange thing with this issue. Any ideas or hints? regards, Ondrej Cernos On Tue, Mar 19, 2013 at 5:51 PM, Ondřej Černoš cern...@gmail.com wrote: Hi all, I am running into strange error when bootstrapping Cassandra cluster in multiple datacenter setup. The setup is as follows: 3 nodes in AWS east, 3 nodes somewhere on Rackspace/Openstack. I use my own snitch based on EC2MultiRegionSnitch (it just adds some ec2 avalability zone parsing capabilities). Nodes in the cluster connect to each other and all seems ok. When I start the Rackspace cluster first, populate it with data and then let the AWS cluster bootstrap from it, it works great. However the other way round it just breaks. The breakage demonstrates as follows: - nodetool rebuild us-east command hangs - cassandra's log contains the following: 2013-03-19 12:42:15.796+0100 [Thread-14] [DEBUG] IncomingTcpConnection.java(63) org.apache.cassandra.net.IncomingTcpConnection: Connection version 6 from ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/xxx.xxx.xxx.xxx 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-2-Data.db to Stream Request queue 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-1-Data.db to Stream Request queue 2013-03-19 12:42:15.806+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(112) org.apache.cassandra.streaming.IncomingStreamReader: Receiving stream 2013-03-19 12:42:15.807+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(113) org.apache.cassandra.streaming.IncomingStreamReader: Creating file for /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2-Data.db with 7808 estimat ed keys 2013-03-19 12:42:15.808+0100 [Thread-14] [DEBUG] ColumnFamilyStore.java(863) org.apache.cassandra.db.ColumnFamilyStore: component=key_space Checking for sstables overlapping [] 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Data.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-CompressionInfo.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-TOC.txt 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Filter.db 2013-03-19 12:42:15.963+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Index.db 2013-03-19 12:42:15.963+0100 [Thread-14] [DEBUG] SSTable.java(154) org.apache.cassandra.io.sstable.SSTable: Deleted /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2 2013-03-19 12:42:15.963+0100 [Thread-14] [INFO] StreamInSession.java(136
Re: Composite columns
Hey, try this blog post by Datastax, it provides a good explanation of CQL3 abstractions. http://www.datastax.com/dev/blog/cql3-for-cassandra-experts regards, ondrej cernos On Wed, Mar 20, 2013 at 8:50 AM, Thierry Templier ttempl...@restlet.com wrote: Hello, I have a question regarding composite columns. What is the way to create and use them basing on CQL3? Are there some documentations regarding this feature? Is it supported with both versions 1.1 and 1.2 of Cassandra? Thanks very much for your help! Thierry
java.io.IOException: FAILED_TO_UNCOMPRESS(5) exception when running nodetool rebuild
Hi all, I am running into strange error when bootstrapping Cassandra cluster in multiple datacenter setup. The setup is as follows: 3 nodes in AWS east, 3 nodes somewhere on Rackspace/Openstack. I use my own snitch based on EC2MultiRegionSnitch (it just adds some ec2 avalability zone parsing capabilities). Nodes in the cluster connect to each other and all seems ok. When I start the Rackspace cluster first, populate it with data and then let the AWS cluster bootstrap from it, it works great. However the other way round it just breaks. The breakage demonstrates as follows: - nodetool rebuild us-east command hangs - cassandra's log contains the following: 2013-03-19 12:42:15.796+0100 [Thread-14] [DEBUG] IncomingTcpConnection.java(63) org.apache.cassandra.net.IncomingTcpConnection: Connection version 6 from ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/xxx.xxx.xxx.xxx 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-2-Data.db to Stream Request queue 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG] StreamInSession.java(104) org.apache.cassandra.streaming.StreamInSession: Adding file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-1-Data.db to Stream Request queue 2013-03-19 12:42:15.806+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(112) org.apache.cassandra.streaming.IncomingStreamReader: Receiving stream 2013-03-19 12:42:15.807+0100 [Thread-14] [DEBUG] IncomingStreamReader.java(113) org.apache.cassandra.streaming.IncomingStreamReader: Creating file for /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2-Data.db with 7808 estimat ed keys 2013-03-19 12:42:15.808+0100 [Thread-14] [DEBUG] ColumnFamilyStore.java(863) org.apache.cassandra.db.ColumnFamilyStore: component=key_space Checking for sstables overlapping [] 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Data.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-CompressionInfo.db 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-TOC.txt 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Filter.db 2013-03-19 12:42:15.963+0100 [Thread-14] [DEBUG] FileUtils.java(110) org.apache.cassandra.io.util.FileUtils: Deleting key_space-column_family-tmp-ib-2-Index.db 2013-03-19 12:42:15.963+0100 [Thread-14] [DEBUG] SSTable.java(154) org.apache.cassandra.io.sstable.SSTable: Deleted /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2 2013-03-19 12:42:15.963+0100 [Thread-14] [INFO] StreamInSession.java(136) org.apache.cassandra.streaming.StreamInSession: Streaming of file /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-2-Data.db sections=127 progress=81048/2444 2013-03-19 12:42:16.059+0100 [Thread-13] [DEBUG] IncomingTcpConnection.java(79) org.apache.cassandra.net.IncomingTcpConnection: IOException reading from socket; closing java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:93) at org.apache.cassandra.streaming.compress.CompressedInputStream.decompress(CompressedInputStream.java:101) at org.apache.cassandra.streaming.compress.CompressedInputStream.read(CompressedInputStream.java:79) at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:337) at org.apache.cassandra.utils.BytesReadTracker.readUnsignedShort(BytesReadTracker.java:140) at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:361) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:160) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) 2013-03-19 12:42:15.971+0100 [Thread-16] [ERROR] CassandraDaemon.java(133) org.apache.cassandra.service.CassandraDaemon: Exception in thread