Re: nodetool repair caused high disk space usage
El sáb, 20-08-2011 a las 01:22 +0200, Peter Schuller escribió: Is there any chance that the entire file from source node got streamed to destination node even though only small amount of data in hte file from source node is supposed to be streamed destination node? Yes, but the thing that's annoying me is that even if so - you should not be seeing a 40 gb - hundreds of gig increase even if all neighbors sent all their data. I'm having the very same issue. Trying to repair a node with 90GB of data fills up the 1.5TB drive, and it's still trying to send more data. This is on 0.8.1.
nodetool repair mykeyspace mycolumnfamily repairs all the keyspace
Hi all, Maybe I'm doing something wrong, but calling ./nodetool -h host repair mykeyspace mycolumnfamily should only repair mycolumnfamily right? Everytime I try a repair it repairs the whole key space instead of just one column family. I'm on cassandra 0.8.1
Re: nodetool repair mykeyspace mycolumnfamily repairs all the keyspace
Are there any plans to backport this to 0.8? El mar, 19-07-2011 a las 11:43 -0500, Jonathan Ellis escribió: https://issues.apache.org/jira/browse/CASSANDRA-2280 2011/7/19 Héctor Izquierdo Seliva izquie...@strands.com: Hi all, Maybe I'm doing something wrong, but calling ./nodetool -h host repair mykeyspace mycolumnfamily should only repair mycolumnfamily right? Everytime I try a repair it repairs the whole key space instead of just one column family. I'm on cassandra 0.8.1
Re: Anyone using Facebook's flashcache?
Hector, some before/after numbers would be great if you can find them. Thanks! I'll try and get some for you :) What happens when your cache gets trashed? Do compactions and flushes go slower? If you use flashcache-wt flushed and compacted sstables will go to the cache. All reads are cached, so if you compact three sstables into one, you are stuffing your cache with a lot of useless crap and evicting valid blocks (flashcache won't honor any of the hints set with fadvise, as it's a block cache layer and doesn't know of them anyway). If your write rate is low it might work for you. aj
Re: Anyone using Facebook's flashcache?
Interesting. So, there is no segregation between read and write cache space? A compaction or flush can evict blocks in the read cache if it needs the space for write buffering? There are two versions, the -wt (write through) that will cache also what is written, and the normal version that will only cache reads. Either way you will pollute your cache with compactions.
Re: Anyone using Facebook's flashcache?
If using the version that has both rt and wt caches, is it just the wt cache that's polluted for compactions/flushes? If not, why does the rt cache also get polluted? As I said, all reads go through flashcache, so if you read three 10 GB sstables for a compaction you will get those 30 GB into the cache.
Re: Anyone using Facebook's flashcache?
Of course. I wasn't thinking clearly. So, back to a previous point you brought up, I will have heavy reads and even heavier writes. How would you rate the benefits of flashcache in such a scenario? Is it still an overall performance boost worth the expense? We have also heavy reads and even heavier writes. The max hit ratio I saw was of ~60%, altought it was enough to halve the latency, which is a fairly good result. I guess you will have to try it for yourself. I'm sorry I can not give you anything more concrete. We have moved since to host all the data on ssds, as it's not that expensive anymore and the results are much much better. You can get 320GB drives from OCZ for ~500$, or do a stripe with three 120GB ones for ~700$.
Re: Anyone using Facebook's flashcache?
I've been using flashcache for a while in production. It improves read performance and latency was halved by a good chunk, though I don't remember the exact numbers. Problems: compactions will trash your cache, and so will memtable flushes. Right now there's no way to avoid that. If you want, I could dig the numbers for a before/after comparison.
Re: Corrupted data
All the important stuff is using QUORUM. Normal operation uses around 3-4 GB of heap out of 6. I've also tried running repair on a per CF basis, and still no luck. I've found it's faster to bootstrap a node again than repairing it. Once I have the cluster in a sane state I'll try running a repair as part of normal operation and see if manages to finish. Btw, we are not using super columns. Thanks for the tips El sáb, 09-07-2011 a las 17:57 -0700, aaron morton escribió: Nop, only when something breaks Unless you've been working at QUORUM life is about to get trickier. Repair is an essential part of running a cassandra cluster, without it you risk data loss and dead data coming back to life. If you have been writing at QUORUM, so have a reasonable expectation of data replication, the normal approach is to happily let scrub skip the rows, after scrub has completed a repair will see the data repaired using one of the other replicas. That's probably already happened as the scrub process skipped the rows when writing them out to the new files. Try to run repair. Try running it on a single CF to start with. Good luck - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 9 Jul 2011, at 16:45, Héctor Izquierdo Seliva wrote: Hi Peter. I have a problem with repair, and it's that it always brings the node doing the repairs down. I've tried setting index_interval to 5000, and it still dies with OutOfMemory errors, or even worse, it generates thousands of tiny sstables before dying. I've tried like 20 repairs during this week. None of them finished. This is on a 16GB machine using 12GB heap so it doesn't crash (too early). El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió: - Have you been running repair consistently ? Nop, only when something breaks This is unrelated to the problem you were asking about, but if you never run delete, make sure you are aware of: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair http://wiki.apache.org/cassandra/DistributedDeletes
Re: node stuck leaving
I'm also having problems with removetoken. Maybe I'm doing it wrong, but I was under the impression that I just had to call once removetoken. When I take a look at the nodes ring, the dead node keeps popping up. What's even more incredible is that in some of them it says UP
Re: node stuck leaving
At the end I had to restart the whole cluster. This is the second time I've had to do this. Would it be possible to add a command that forces all nodes to remove all the ring data and start it fresh? I'd rather have a few seconds of errors in the clients that the two to five minutes that takes a full cluster restart.
Re: Corrupted data
Hi Peter. I have a problem with repair, and it's that it always brings the node doing the repairs down. I've tried setting index_interval to 5000, and it still dies with OutOfMemory errors, or even worse, it generates thousands of tiny sstables before dying. I've tried like 20 repairs during this week. None of them finished. This is on a 16GB machine using 12GB heap so it doesn't crash (too early). El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió: - Have you been running repair consistently ? Nop, only when something breaks This is unrelated to the problem you were asking about, but if you never run delete, make sure you are aware of: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair http://wiki.apache.org/cassandra/DistributedDeletes
Corrupted data
Hi everyone, I'm having thousands of these errors: WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 737) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 6292724931198053 at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633) at org.apache.cassandra.db.compaction.CompactionManager.access $600(CompactionManager.java:65) at org.apache.cassandra.db.compaction.CompactionManager $3.call(CompactionManager.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Impossible row size 6292724931198053 ... 9 more INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 743) Retrying from row index; data is -8 bytes starting at 4735525245 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 767) Retry failed too. Skipping to next row (retry's stacktrace follows) java.io.IOError: java.io.EOFException: bloom filter claims to be 863794556 bytes, longer than entire row size -8 THis is during scrub, as I saw similar errors while in normal operation. Is there anything I can do? It looks like I'm going to lose a ton of data
Re: Corrupted data
Hi Aaron, El vie, 08-07-2011 a las 14:47 -0700, aaron morton escribió: You may not lose data. - What version and whats the upgrade history? all versions from 0.7.1 to 0.8.1. All cfs were in 0.8.1 format though - What RF / node count / CL ? RF=3, node count = 6 - Have you been running repair consistently ? Nop, only when something breaks - Is this on a single node or all nodes ? A couple of nodes. Scrub told there were a few thousand of columns it could not restore. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote: Hi everyone, I'm having thousands of these errors: WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 737) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Impossible row size 6292724931198053 at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633) at org.apache.cassandra.db.compaction.CompactionManager.access $600(CompactionManager.java:65) at org.apache.cassandra.db.compaction.CompactionManager $3.call(CompactionManager.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Impossible row size 6292724931198053 ... 9 more INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 743) Retrying from row index; data is -8 bytes starting at 4735525245 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705 CompactionManager.java (line 767) Retry failed too. Skipping to next row (retry's stacktrace follows) java.io.IOError: java.io.EOFException: bloom filter claims to be 863794556 bytes, longer than entire row size -8 THis is during scrub, as I saw similar errors while in normal operation. Is there anything I can do? It looks like I'm going to lose a ton of data
Re: Cannot recover SSTable with version f (current version g)
Thanks Sylvain. I'll try option 3 when the current repair ends so I can fix the remaining CFs. I'm also finding a few of this while opening sstables that have been build with repair (SSTable build compactions) ERROR [CompactionExecutor:2] 2011-07-06 10:09:16,054 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[CompactionExecutor:2,1,main] java.lang.NullPointerException at org.apache.cassandra.io.sstable.SSTableWriter $RowIndexer.close(SSTableWriter.java:382) at org.apache.cassandra.io.sstable.SSTableWriter $RowIndexer.index(SSTableWriter.java:370) at org.apache.cassandra.io.sstable.SSTableWriter $Builder.build(SSTableWriter.java:315) at org.apache.cassandra.db.compaction.CompactionManager $9.call(CompactionManager.java:1103) at org.apache.cassandra.db.compaction.CompactionManager $9.call(CompactionManager.java:1094) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) El mié, 06-07-2011 a las 11:22 +0200, Sylvain Lebresne escribió: 2011/7/6 Héctor Izquierdo Seliva izquie...@strands.com: Hi, i've been struggling to repair my failed node for the past few days, and I've seen this erros a few times. java.lang.RuntimeException: Cannot recover SSTable with version f (current version g). If it can read the sstables, why can't they be used to repair? After a sstable is streamed (by repair in particular), we first have to recreate a number of data structures. To do that, a specific part of the code is used that do not know how to cope with older sstable version (hence the message). This does not mean that this won't even get fixed, it will, but it is a current limitation. Is there anything I can do besides running scrub or compact on all the cluster? Not really no. You can wait enough time so that all old sstables have been compacted to newer version before running a repair, but since that could take a long time if you have lots of data, that may not always be an option. Now probably the most efficient way (the quote are because it's not the most efficient in time and investment it will require from you) is probably something like that: For each of the nodes of the cluster: 1) look the sstables on the node (do a 'ls' on the data repository). The sstable will look something like Standard1-g-105-Data.db where the 'g' may be a 'f' for some (all?) of the sstable. The 'f' means old sstable, the 'g' means new ones. 2) if all or almost all the big sstables are 'f', then run scrub on that node, that'll be simpler. 3) if only a handfull of sstable are on 'f' (typically, if the node has not just been updated), then what you can do is to force a compaction on those only by using JMX-CompactionManager-forceUserDefinedCompaction, giving it the keyspace name as first argument and the full path to the sstable as second argument. Regards Hector Izquierdo
OutOfMemory during repair on 0.8.1
Hi all, I don't seem to be able to complete a full repair on one of the nodes. Memory consuptiom keeps growing till it starts complaining about not having enough heap. I had to disable the automatic memtable flush, as it was generating thousands of almost empty memtables. My guess is that the key indexes that are kept in memory grow with each new sstable that the repair generates. On my third attempt, I have 1102 pending tasks (which seems to be almost all SSTable build operations mixed with a few compactions), and heap is like 80%-85% full Is there a setting or tweak I can make? I had to up the heap from 6GB to 10GB in a 12 GB machine. I can't give it any more heap.
Re: OutOfMemory during repair on 0.8.1
Forcing a full gc doesn't help either. Now the node is stuck in an endless loop of full gcs that don't free any memory.
Re: Repair doesn't work after upgrading to 0.8.1
Hi All, sorry for taking so long to answer. I was away from the internet. Héctor, when you say I have upgraded all my cluster to 0.8.1, from which version was that: 0.7.something or 0.8.0 ? 0.7.6-2 to 0.8.1 This is the same behavior I reported in 2768 as Aaron referenced ... What was suggested for us was to do the following: - Shut down the entire ring - When you bring up each node, do a nodetool repair That's exactly what I ended up doing. Repair now works. I tried to do a rolling restart with 2818 applied, but it did not work. However, in the issue reported, it was unable to be reproduced ... I'd be curious to know how Hector's keyspace is defined. Ours at the time was RF=3 and using Ec2 snitch... Nothing special, Default snithch, RF=3. I think this should be prioritized, as having to restart the whole cluster is a bit extreme. We don't have separate DCs, so I had to incurre on downtime, which costs money, and a little bit of grief. El vie, 01-07-2011 a las 10:16 +0200, Sylvain Lebresne escribió: To make it clear what the problem is, this is not a repair problem. This is a gossip problem. Gossip is reporting that the remote node is a 0.7 node and repair is just saying I cannot use that node because repair has changed and the 0.7 node will not know how to answer me correctly, which is the correct behavior if the node happens to be a 0.7 node. Hence, I'm kind of baffled that dropping a keyspace and recreating it fixed anything. Unless as part of removed the keyspace, you've deleted the system tables, in which case that could have triggered something. -- Sylvain On Fri, Jul 1, 2011 at 9:33 AM, Sasha Dolgy sdo...@gmail.com wrote: This is the same behavior I reported in 2768 as Aaron referenced ... What was suggested for us was to do the following: - Shut down the entire ring - When you bring up each node, do a nodetool repair That didn't immediately resolve the problems. In the end, I backed up all the data, removed the keyspace and created a new one. That seemed to have solved our problems. That was from 0.7.6-2 to 0.8.0 However, in the issue reported, it was unable to be reproduced ... I'd be curious to know how Hector's keyspace is defined. Ours at the time was RF=3 and using Ec2 snitch... -sd On Fri, Jul 1, 2011 at 9:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: Héctor, when you say I have upgraded all my cluster to 0.8.1, from which version was that: 0.7.something or 0.8.0 ? If this was 0.8.0, did you run successful repair on 0.8.0 previous to the upgrade ?
Repair doesn't work after upgrading to 0.8.1
Hi all, I have upgraded all my cluster to 0.8.1. Today one of the disks in one of the nodes died. After replacing the disk I tried running repair, but this message appears: INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed. What can I do?
Re: insufficient space to compact even the two smallest files, aborting
Hi Aaron. Reverted back to 4-32. Did the flush but it did not trigger any minor compaction. Ran compact by hand, and it picked only two sstables. Here's the ls before: http://pastebin.com/xDtvVZvA And this is the ls after: http://pastebin.com/DcpbGvK6 Any suggestions? El jue, 23-06-2011 a las 10:55 +1200, aaron morton escribió: Setting them to 2 and 2 means compaction can only ever compact 2 files at time, so it will be worse off. Lets the try following: - restore the compactions settings to the default 4 and 32 - run `ls -lah` in the data dir and grab the output - run `nodetool flush` this will trigger minor compaction once the memtables have been flushed - check the logs for messages from 'CompactionManager' - when done grab the output from `ls -lah` again. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23 Jun 2011, at 02:04, Héctor Izquierdo Seliva wrote: Hi All. I set the compaction threshold at minimum 2, maximum 2 and try to run compact, but it's not doing anything. There are over 69 sstables now, read performance is horrible, and it's taking an insane amount of space. Maybe I don't quite get how the new per bucket stuff works, but I think this is not normal behaviour. El lun, 13-06-2011 a las 10:32 -0500, Jonathan Ellis escribió: As Terje already said in this thread, the threshold is per bucket (group of similarly sized sstables) not per CF. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com: I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report.
Re: insufficient space to compact even the two smallest files, aborting
Hi All. I set the compaction threshold at minimum 2, maximum 2 and try to run compact, but it's not doing anything. There are over 69 sstables now, read performance is horrible, and it's taking an insane amount of space. Maybe I don't quite get how the new per bucket stuff works, but I think this is not normal behaviour. El lun, 13-06-2011 a las 10:32 -0500, Jonathan Ellis escribió: As Terje already said in this thread, the threshold is per bucket (group of similarly sized sstables) not per CF. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com: I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report.
Re: Cassandra Statistics and Metrics
This is what I use: http://code.google.com/p/simple-cassandra-monitoring/ Disclaimer: I did it myself, don't expect too much :P El jue, 16-06-2011 a las 19:35 +0300, Viktor Jevdokimov escribió: There's possibility to use command line JMX client with standard Zabbix agent to request JMX counters without incorporating zapcat into Cassandra or another Java app. I'm investigating this feature right now, will post results when finish. 2011/6/15 Viktor Jevdokimov vjevdoki...@gmail.com http://www.kjkoster.org/zapcat/Zapcat_JMX_Zabbix_Bridge.html 2011/6/14 Marcos Ortiz mlor...@uci.cu Where I can find the source code? El 6/14/2011 10:13 AM, Viktor Jevdokimov escribió: We're using open source monitoring solution Zabbix from http://www.zabbix.com/ using zapcat - not only for Cassandra but for the whole system. As MX4J tools plugin is supported by Cassandra, support of zapcat in Cassandra by default is welcome - we have to use a wrapper to start zapcat agent. 2011/6/14 Marcos Ortiz mlor...@uci.cu Regards to all. My team and me here on the University are working on a generic solution for Monitoring and Capacity Planning for Open Sources Databases, and one of the NoSQL db that we choosed to give it support is Cassandra. Where I can find all the metrics and statistics of Cassandra? I'm thinking for example: - Available space - Number of CF and all kind of metrics We are using for this development: Python + Django + Twisted + Orbited + jQuery. The idea behind is to build a Comet-based web application on top of these technologies. Any advice is welcome -- Marcos Luís Ortíz Valmaseda Software Engineer (UCI) http://marcosluis2186.posterous.com http://twitter.com/marcosluis2186 -- Marcos Luís Ortíz Valmaseda Software Engineer (UCI) http://marcosluis2186.posterous.com http://twitter.com/marcosluis2186
Re: insufficient space to compact even the two smallest files, aborting
Hi All. I found a way to be able to compact. I have to call scrub on the column family. Then scrub gets stuck forever. I restart the node, and voila! I can compact again without any message about not having enough space. This looks like a bug to me. What info would be needed to fill a report? This is on 0.8 updating from 0.7.5
Re: insufficient space to compact even the two smallest files, aborting
I was already way over the minimum. There were 12 sstables. Also, is there any reason why scrub got stuck? I did not see anything in the logs. Via jmx I saw that the scrubbed bytes were equal to one of the sstables size, and it stuck there for a couple hours . El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió: That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report.
Re: Retrieving a column from a fat row vs retrieving a single row
I think I will follow the advice of better balancing and I will split the index into several pieces. Thanks everybody for your input!
Re: insufficient space to compact even the two smallest files, aborting
Hi Terje, There are 12 SSTables, so I don't think that's the problem. I will try anyway and see what happens. El vie, 10-06-2011 a las 20:21 +0900, Terje Marthinussen escribió: bug in the 0.8.0 release version. Cassandra splits the sstables depending on size and tries to find (by default) at least 4 files of similar size. If it cannot find 4 files of similar size, it logs that message in 0.8.0. You can try to reduce the minimum required files for compaction and it will work.
Re: insufficient space to compact even the two smallest files, aborting
El vie, 10-06-2011 a las 20:21 +0900, Terje Marthinussen escribió: bug in the 0.8.0 release version. Cassandra splits the sstables depending on size and tries to find (by default) at least 4 files of similar size. If it cannot find 4 files of similar size, it logs that message in 0.8.0. You can try to reduce the minimum required files for compaction and it will work. Terje Hi Terje, There are 12 SSTables, so I don't think that's the problem. I will try anyway and see what happens.
Re: insufficient space to compact even the two smallest files, aborting
El vie, 10-06-2011 a las 23:40 +0900, Terje Marthinussen escribió: Yes, which is perfectly fine for a short time if all you want is to compact to one file for some reason. I run min_compaction_threshold = 2 on one system here with SSD. No problems with the more aggressive disk utilization on the SSDs from the extra compactions, reducing disk space is much more important. Note that this is a treshold per bucket of similar sized sstables. Not the total number of sstables, so a treshold of 2 will not give you one big file. Terje Cassandra refuses to do a major compaction no matter what I do. There are 110GB free, and all the sstables I want to compact amount to 15GB, and the same message keeps popping up.
Re: Retrieving a column from a fat row vs retrieving a single row
El jue, 09-06-2011 a las 13:28 +0200, Richard Low escribió: Remember also that partitioning is done by rows, not columns. So large rows are stored on a single host. This means they can't be load balanced and also all requests to that row will hit one host. Having separate rows will allow load balancing of I/Os. Yeah, but if I have RF=3 then there are three nodes that can answer the request right?
Re: Data directories
I'm actually using it in a couple of nodes, but is slower than directly accesing the data in a ssd. El jue, 09-06-2011 a las 11:10 -0400, Chris Burroughs escribió: On 06/08/2011 05:54 AM, Héctor Izquierdo Seliva wrote: Is there a way to control what sstables go to what data directory? I have a fast but space limited ssd, and a way slower raid, and i'd like to put latency sensitive data into the ssd and leave the other data in the raid. Is this possible? If not, how well does cassandra play with symlinks? Another option would be to use the ssd as a block level cache with something like flashcache https://github.com/facebook/flashcache/.
Data directories
Hi, Is there a way to control what sstables go to what data directory? I have a fast but space limited ssd, and a way slower raid, and i'd like to put latency sensitive data into the ssd and leave the other data in the raid. Is this possible? If not, how well does cassandra play with symlinks?
Re: Data directories
El mié, 08-06-2011 a las 08:42 -0500, Jonathan Ellis escribió: No. https://issues.apache.org/jira/browse/CASSANDRA-2749 is open to track this but nobody is working on it to my knowledge. Cassandra is fine with symlinks at the data directory level but I don't think that helps you, since you really want to move the sstables themselves. (Cassandra is NOT fine with symlinked sstable files, or with any moving around of sstable files while it is running.) I was planing on creating another keyspace and moving the slow sstables there. Of course everything done while the node is stopped. Thanks for your help
Retrieving a column from a fat row vs retrieving a single row
Hi, I have an index I use to translate ids. I usually only read a column at a time, and it's becoming a bottleneck. I could rewrite the application to read a bunch at a time but it would make the application logic much harder, as it would involve buffering incoming data. As far as I know, to read a single column cassandra will deserialize a bunch of them and then pick the correct one (64KB of data right?) Would it be faster to have a row for each id I want to translate? This would make keycache less effective, but the amount of data read should be smaller. Thanks!
Concurrent Mark Sweep taking 12 seconds
Hi everyone. I see in the logs that Concurrent Mark Sweep is taking 12 seconds to do its stuff. Is this normal? There is no stop-the-world GC, it just takes 12 seconds. Configuration: 0.7.5 , 8GB Heap, 16GB machines. 7 * 64 MB memtables.
Re: Index interval tuning
El mié, 11-05-2011 a las 14:24 +1200, aaron morton escribió: What version and what were the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives ? The bloom filter metrics are updated in SSTableReader.getPosition() the only slightly odd thing I can see is that we do not count a key cache hit a a true positive for the bloom filter. If there were a lot of key cache hits and a few false positives the ratio would be wrong. I'll ask around, does not seem to apply to Hectors case though. Cheers 0.7.5, and I am no longer using key cache. I get the bloom filter stats via jmx. BloomFilterFalsePositiveRatio is always stuck at 1.0. RecentBloomFilterFalsePositiveRation fluctuates from 0 to 1.0 with no intermediate values. As for the index interval settings, I changed it from 128 to 256 and memory consumption was just a tad lower but read performance was worse by a few ms, so not much to gain there. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 May 2011, at 10:38, Chris Burroughs wrote: On 05/10/2011 02:12 PM, Peter Schuller wrote: That reminds me, my false positive ration is stuck at 1.0, so I guess bloom filters aren't doing a lot for me. That sounds unlikely unless you're hitting some edge case like reading a particular row that happened to be a collision, and only that row. This is from JMX stats on the column family store? (From jmx) I also see BloomFilterFalseRatio stuck at 1.0 on my production nodes. The only values that RecentBloomFilterFalseRatio had over the past several minutes were 0.0 and 1.0. While I can't prove that isn't accurate, it is very suspicions. The code looked reasonable until I got to SSTableReader, which was too complicated to just glance through.
Re: Index interval tuning
Sorry aaron, here are the values you requested RecentBloomFilterFalsePositives = 5; BloomFilterFalsePositives = 385260; uptime of the node is three days and a half, more or less El mié, 11-05-2011 a las 22:05 +1200, aaron morton escribió: What are the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives the non ratio ones ? - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 May 2011, at 19:53, Héctor Izquierdo Seliva wrote: El mié, 11-05-2011 a las 14:24 +1200, aaron morton escribió: What version and what were the values for RecentBloomFilterFalsePositives and BloomFilterFalsePositives ? The bloom filter metrics are updated in SSTableReader.getPosition() the only slightly odd thing I can see is that we do not count a key cache hit a a true positive for the bloom filter. If there were a lot of key cache hits and a few false positives the ratio would be wrong. I'll ask around, does not seem to apply to Hectors case though. Cheers 0.7.5, and I am no longer using key cache. I get the bloom filter stats via jmx. BloomFilterFalsePositiveRatio is always stuck at 1.0. RecentBloomFilterFalsePositiveRation fluctuates from 0 to 1.0 with no intermediate values. As for the index interval settings, I changed it from 128 to 256 and memory consumption was just a tad lower but read performance was worse by a few ms, so not much to gain there. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 May 2011, at 10:38, Chris Burroughs wrote: On 05/10/2011 02:12 PM, Peter Schuller wrote: That reminds me, my false positive ration is stuck at 1.0, so I guess bloom filters aren't doing a lot for me. That sounds unlikely unless you're hitting some edge case like reading a particular row that happened to be a collision, and only that row. This is from JMX stats on the column family store? (From jmx) I also see BloomFilterFalseRatio stuck at 1.0 on my production nodes. The only values that RecentBloomFilterFalseRatio had over the past several minutes were 0.0 and 1.0. While I can't prove that isn't accurate, it is very suspicions. The code looked reasonable until I got to SSTableReader, which was too complicated to just glance through.
Index interval tuning
Hi everyone. I have a few sstables with around 500 million keys, and memory usage has grown a lot, I suppose because of the indexes. This sstables are comprised of skinny rows, but a lot of them. Would tuning index interval make the memory usage go down? And what would the performance hit be? I had to up heap from 5GB to 8GB and tune memtable thresholds way lower than what I was using with less data. I'm running 0.7.5 in a 6 machine cluster with RF=3. HW is quad core intel machines with 16GB ram, and md raid0 on three sata disks. Thanks all for your time!
Re: Index interval tuning
El lun, 09-05-2011 a las 17:58 +0200, Peter Schuller escribió: I have a few sstables with around 500 million keys, and memory usage has grown a lot, I suppose because of the indexes. This sstables are comprised of skinny rows, but a lot of them. Would tuning index interval make the memory usage go down? And what would the performance hit be? Assuming no row caching, and assuming you're talking about heap usage and not the virtual size of the process in top, the primary two things that will grow with row count are (1) bloom filters for sstables and (2) the sampled index keys. Bloom filters are of a certain size to achieve a sufficiently small false positive rate. That target rate could be increased to allow smaller bloom filters, but that is not exposed as a configuration option and would require code changes. No row cache and no key cache. I've tried with both, but the keys being read are constantly changing, and I didn't see hit ratios beyond 0.8 %. That reminds me, my false positive ration is stuck at 1.0, so I guess bloom filters aren't doing a lot for me. For key sampling, the primary performance penalty should be CPU and maybe some disk. On average, when looking up a key an sstable index file, you'll read sample interval/2 entries and deserialize them before finding the one you're after. Increasing sampling interval will thus increase the amount of deserialization taking place, as well as make the average range of data span additional pages on disk. The impact on disk is difficult to judge and likely depends a lot on i/o scheduling and other details. So the only thing I can do is test it and see how it goes. To make the change affective, should I do anything beyond changing the value in cassandra.yaml and restart the node? I'll try first with 256 and see what happens.
Re: Problems recovering a dead node
I'm sorry but I can't provide more detailed info as I have restarted the node. After that the number of pending tasks started at 40, and rapidly went down as compactions finished. After that, the ring looks ok, with all the nodes having about the same amount of data. There were no errors in the node log, only info messages. Should I run repair again just in case? The next time I have to recover a node, is there a safer/faster way of doing it? My guess about the number of pending tasks and sstables is that during repair, it seemed to ask for ranges of the same column family a lot of times, thus yielding a lot of tiny sstables. This caused minor compactions to pile up. I have read about never ending repairs, or repairs done a lot of times over small amounts of data on the mailing lists. Could this be what happened? Thanks! El mié, 04-05-2011 a las 21:02 +1200, aaron morton escribió: Certainly sounds a bit sick. The first error looks like it happens when the index file points to the wrong place in the data file for the SSTable. The second one happens when the index file is corrupted. The should be problems nodetool scrub can fix. The disk space may be dead space to cassandra compaction or some other streaming failure. You can check how much it considers to be live (in use) space using nodetool cfstats. This will also tell you how many sstables are live. Having a lot of dead SSTables is not necessarily a bad thing. What are the pending tasks ? what is nodetool tpstats showing ? And what does nodetool ring show from one of the other nodes ? I'm assuming there are no errors in the logs on the node. What are the most recent INFO messages? Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 4 May 2011, at 17:54, Héctor Izquierdo Seliva wrote: Hi Aaron It has no data files whatsoever. The upgrade path is 0.7.4 - 0.7.5. It turns out the initial problem was the sw raid failing silently because of another faulty disk. Now that the storage is working, I brought up the node again, same IP, same token and tried doing nodetool repair. All adjacent nodes have finished the streaming session, and now the node has a total of 248 GB of data. Is this normal when the load per node is about 18GB? Also there are 1245 pending tasks. It's been compacting or rebuilding sstables for the last 8 hours non stop. There are 2057 sstables in the data folder. Should I have done thing differently or is this the normal behaviour? Thanks! El mié, 04-05-2011 a las 07:54 +1200, aaron morton escribió: When you say it's clean does that mean the node has no data files ? After you replaced the disk what process did you use to recover ? Also what version are you running and what's the recent upgrade history ? Cheers Aaron On 3 May 2011, at 23:09, Héctor Izquierdo Seliva wrote: Hi everyone. One of the nodes in my 6 node cluster died with disk failures. I have replaced the disks, and it's clean. It has the same configuration (same ip, same token). When I try to restart the node it starts to throw mmap underflow exceptions till it closes again. I tried setting io to standard, but it still fails. It gives errors about two decorated keys being different, and the EOFException. Here is an excerpt of the log http://pastebin.com/ZXW1wY6T I can provide more info if needed. I'm at a loss here so any help is appreciated. Thanks all for your time Héctor Izquierdo
Re: Problems recovering a dead node
El mié, 04-05-2011 a las 21:02 +1200, aaron morton escribió: Certainly sounds a bit sick. The first error looks like it happens when the index file points to the wrong place in the data file for the SSTable. The second one happens when the index file is corrupted. The should be problems nodetool scrub can fix. The disk space may be dead space to cassandra compaction or some other streaming failure. You can check how much it considers to be live (in use) space using nodetool cfstats. This will also tell you how many sstables are live. Having a lot of dead SSTables is not necessarily a bad thing. What are the pending tasks ? what is nodetool tpstats showing ? And what does nodetool ring show from one of the other nodes ? I'm assuming there are no errors in the logs on the node. What are the most recent INFO messages? Hope that helps. In the end I have had to run repair again, as I was getting old data back. It seems I'm having the same problem again. Here is my cfstats: http://pastebin.com/B9eD3b4R I have 796 sstables for a total of 108GB (and counting) on my data folder. Almost all of them come from streaming. 694 pending operations. Here is my ring info: 10.20.13.75 Up Normal 16.99 GB16.67% 28356863910078205288614550619314017621 10.20.13.76 Up Normal 26.76 GB16.67% 56713727820156410577229101238628035242 10.20.13.77 Up Normal 28.23 GB16.67% 85070591730234615865843651857942052863 10.20.13.78 Up Normal 29.19 GB16.67% 113427455640312821154458202477256070484 10.20.13.79 Up Normal 27.71 GB16.67% 141784319550391026443072753096570088105 10.20.13.80 Up Normal 25.36 GB16.67% 170141183460469231731687303715884105727 And here is the output of tpstats: Pool NameActive Pending Completed ReadStage 0 06943016 RequestResponseStage 0 0 15011243 MutationStage 0 0 964296 ReadRepairStage 0 04064197 GossipStage 0 0 59499 AntiEntropyStage 0 0 77 MigrationStage0 0 0 MemtablePostFlusher 0 0 14 StreamStage 0 0 0 FlushWriter 0 0 14 FILEUTILS-DELETE-POOL 0 0 2 MiscStage 0 0 83 FlushSorter 0 0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 6
Problems recovering a dead node
Hi everyone. One of the nodes in my 6 node cluster died with disk failures. I have replaced the disks, and it's clean. It has the same configuration (same ip, same token). When I try to restart the node it starts to throw mmap underflow exceptions till it closes again. I tried setting io to standard, but it still fails. It gives errors about two decorated keys being different, and the EOFException. Here is an excerpt of the log http://pastebin.com/ZXW1wY6T I can provide more info if needed. I'm at a loss here so any help is appreciated. Thanks all for your time Héctor Izquierdo
Re: Problems recovering a dead node
Hi Aaron It has no data files whatsoever. The upgrade path is 0.7.4 - 0.7.5. It turns out the initial problem was the sw raid failing silently because of another faulty disk. Now that the storage is working, I brought up the node again, same IP, same token and tried doing nodetool repair. All adjacent nodes have finished the streaming session, and now the node has a total of 248 GB of data. Is this normal when the load per node is about 18GB? Also there are 1245 pending tasks. It's been compacting or rebuilding sstables for the last 8 hours non stop. There are 2057 sstables in the data folder. Should I have done thing differently or is this the normal behaviour? Thanks! El mié, 04-05-2011 a las 07:54 +1200, aaron morton escribió: When you say it's clean does that mean the node has no data files ? After you replaced the disk what process did you use to recover ? Also what version are you running and what's the recent upgrade history ? Cheers Aaron On 3 May 2011, at 23:09, Héctor Izquierdo Seliva wrote: Hi everyone. One of the nodes in my 6 node cluster died with disk failures. I have replaced the disks, and it's clean. It has the same configuration (same ip, same token). When I try to restart the node it starts to throw mmap underflow exceptions till it closes again. I tried setting io to standard, but it still fails. It gives errors about two decorated keys being different, and the EOFException. Here is an excerpt of the log http://pastebin.com/ZXW1wY6T I can provide more info if needed. I'm at a loss here so any help is appreciated. Thanks all for your time Héctor Izquierdo
Re: Tombstones and memtable_operations
El mié, 20-04-2011 a las 23:00 +1200, aaron morton escribió: Looks like a bug, I've added a patch here https://issues.apache.org/jira/browse/CASSANDRA-2519 Aaron That was fast! Thanks Aaron
Re: How to warm up a cold node
Shouldn't the dynamic snitch take into account response times and ask a slow node for less requests? It seems that at node startup, only a handfull of requests arrive to the node and it keeps up well, but there's moment where there's more than it can handle with a cold cache and starts droping messages like crazy. Could it be the transition from slow node to ok node is to steep? El vie, 15-04-2011 a las 16:19 +0200, Peter Schuller escribió: Hi everyone, is there any recommended procedure to warm up a node before bringing it up? Currently the only out-of-the-box support for warming up caches is that implied by the key cache and row cache, which will pre-heat on start-up. Indexes will be indirectly preheated by index sampling, to the extent that they operating system retains them in page cache. If you're wanting to pre-heat sstables there's currently no way to do that (but it's a useful feature to have). Pragmatically, you can script something that e.g. does cat path/to/keyspace/* /dev/null or similar. But that only works if the total database size fits reasonably well in page cache. Pre-heating sstables on a per-cf basis on start-up would be a nice feature to have.
Tombstones and memtable_operations
Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!
Re: Tombstones and memtable_operations
Ok, I've read about gc grace seconds, but i'm not sure I understand it fully. Untill gc grace seconds have passed, and there is a compaction, the tombstones live in memory? I have to delete 100 million rows and my insert rate is very low, so I don't have a lot of compactions. What should I do in this case? Lower the major compaction threshold and memtable_operations to some very low number? Thanks El mar, 19-04-2011 a las 17:36 +0200, Héctor Izquierdo Seliva escribió: Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!
Re: How to warm up a cold node
El mié, 20-04-2011 a las 07:59 +1200, aaron morton escribió: The dynamic snitch only reduces the chance that a node used in a read operation, it depends on the RF, the CL for the operation, the partitioner and possibly the network topology. Dropping read messages is ok, so long as your operation completes at the requested CL. Are you using either a key_cache or a row_cache ? If so have you enabled the background save for those as part of the column definition? If this is about getting the OS caches warmed up you should see the pending count on the READ stage backup up and the io stats start to show longer queues. In that case Peters suggestion below is prob the best option. Another hacky way to warm things would be to use get_count() on rows the app thiks it may need to use. This will cause all the columns to be read from disk, but not sent over the network. Aaron The rows that need to be readed change over time and they are only read for short amounts of time, so it's a bit tricky. I'll try to figure out a way to use your suggestions. In another thread I asked if it would be feasible to somehow store what parts of the sstables are hot on shutdown and re read those parts of the files on startup. Could it be done in a similar way to the work that's being done on page migrations? What do you think? Thanks for your time! On 20 Apr 2011, at 00:41, Héctor Izquierdo Seliva wrote: Shouldn't the dynamic snitch take into account response times and ask a slow node for less requests? It seems that at node startup, only a handfull of requests arrive to the node and it keeps up well, but there's moment where there's more than it can handle with a cold cache and starts droping messages like crazy. Could it be the transition from slow node to ok node is to steep? El vie, 15-04-2011 a las 16:19 +0200, Peter Schuller escribió: Hi everyone, is there any recommended procedure to warm up a node before bringing it up? Currently the only out-of-the-box support for warming up caches is that implied by the key cache and row cache, which will pre-heat on start-up. Indexes will be indirectly preheated by index sampling, to the extent that they operating system retains them in page cache. If you're wanting to pre-heat sstables there's currently no way to do that (but it's a useful feature to have). Pragmatically, you can script something that e.g. does cat path/to/keyspace/* /dev/null or similar. But that only works if the total database size fits reasonably well in page cache. Pre-heating sstables on a per-cf basis on start-up would be a nice feature to have.
Re: Tombstones and memtable_operations
El mié, 20-04-2011 a las 08:16 +1200, aaron morton escribió: I think their may be an issue here, we are counting the number of columns in the operation. When deleting an entire row we do not have a column count. Can you let us know what version you are using and how you are doing the delete ? Thanks Aaron I'm using 0.7.4. I have a file with all the row keys I have to delete (around 100 million) and I just go through the file and issue deletes through pelops. Should I manually issue flushes with a cron every x time? On 20 Apr 2011, at 04:21, Héctor Izquierdo Seliva wrote: Ok, I've read about gc grace seconds, but i'm not sure I understand it fully. Untill gc grace seconds have passed, and there is a compaction, the tombstones live in memory? I have to delete 100 million rows and my insert rate is very low, so I don't have a lot of compactions. What should I do in this case? Lower the major compaction threshold and memtable_operations to some very low number? Thanks El mar, 19-04-2011 a las 17:36 +0200, Héctor Izquierdo Seliva escribió: Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!
Re: Tombstones and memtable_operations
El mar, 19-04-2011 a las 23:33 +0300, shimi escribió: You can use memtable_flush_after_mins instead of the cron Shimi Good point! I'll try that. Wouldn't it be better to count a delete as a one column operation so it contributes to flush by operations? 2011/4/19 Héctor Izquierdo Seliva izquie...@strands.com El mié, 20-04-2011 a las 08:16 +1200, aaron morton escribió: I think their may be an issue here, we are counting the number of columns in the operation. When deleting an entire row we do not have a column count. Can you let us know what version you are using and how you are doing the delete ? Thanks Aaron I'm using 0.7.4. I have a file with all the row keys I have to delete (around 100 million) and I just go through the file and issue deletes through pelops. Should I manually issue flushes with a cron every x time? On 20 Apr 2011, at 04:21, Héctor Izquierdo Seliva wrote: Ok, I've read about gc grace seconds, but i'm not sure I understand it fully. Untill gc grace seconds have passed, and there is a compaction, the tombstones live in memory? I have to delete 100 million rows and my insert rate is very low, so I don't have a lot of compactions. What should I do in this case? Lower the major compaction threshold and memtable_operations to some very low number? Thanks El mar, 19-04-2011 a las 17:36 +0200, Héctor Izquierdo Seliva escribió: Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed to work or have I done something wrong? Thanks!
Re: Tombstones and memtable_operations
I poste it a couple of messages back, but here it is again: I'm using 0.7.4. I have a file with all the row keys I have to delete (around 100 million) and I just go through the file and issue deletes through pelops. Should I manually issue flushes with a cron every x time?
Re: RE: batch_mutate failed: out of sequence response
Thanks Dan for fixing that! Is the change integrated in the latest maven snapshot? El mar, 19-04-2011 a las 10:48 +1000, Dan Washusen escribió: An example scenario (that is now fixed in Pelops): 1. Attempt to write a column with a null value 2. Cassandra throws a TProtocolException which renders the connection useless for future operations 3. Pelops returns the corrupt connection to the pool 4. A second read operation is attempted with the corrupt connection and Cassandra throws an ApplicationException A Pelops test case for this can be found here: https://github.com/s7/scale7-pelops/blob/3fe7584a24bb4b62b01897a814ef62415bd2fe43/src/test/java/org/scale7/cassandra/pelops/MutatorIntegrationTest.java#L262 Cheers, -- Dan Washusen On Tuesday, 19 April 2011 at 10:28 AM, Jonathan Ellis wrote: Any idea what's causing the original TPE? On Mon, Apr 18, 2011 at 6:22 PM, Dan Washusen d...@reactive.org wrote: It turns out that once a TProtocolException is thrown from Cassandra the connection is useless for future operations. Pelops was closing connections when it detected TimedOutException, TTransportException and UnavailableException but not TProtocolException. We have now changed Pelops to close connections is all cases *except* NotFoundException. Cheers, -- Dan Washusen On Friday, 8 April 2011 at 7:28 AM, Dan Washusen wrote: Pelops uses a single connection per operation from a pool that is backed by Apache Commons Pool (assuming you're using Cassandra 0.7). I'm not saying it's perfect but it's NOT sharing a connection over multiple threads. Dan Hendry mentioned that he sees these errors. Is he also using Pelops? From his comment about retrying I'd assume not... -- Dan Washusen On Thursday, 7 April 2011 at 7:39 PM, Héctor Izquierdo Seliva wrote: El mié, 06-04-2011 a las 21:04 -0500, Jonathan Ellis escribió: out of sequence response is thrift's way of saying I got a response for request Y when I expected request X. my money is on using a single connection from multiple threads. don't do that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
How to warm up a cold node
Hi everyone, is there any recommended procedure to warm up a node before bringing it up? Thanks!
Re: How to warm up a cold node
How difficult do you think this could be? I would be interested into developing this if it's feasible. El vie, 15-04-2011 a las 16:19 +0200, Peter Schuller escribió: Hi everyone, is there any recommended procedure to warm up a node before bringing it up? Currently the only out-of-the-box support for warming up caches is that implied by the key cache and row cache, which will pre-heat on start-up. Indexes will be indirectly preheated by index sampling, to the extent that they operating system retains them in page cache. If you're wanting to pre-heat sstables there's currently no way to do that (but it's a useful feature to have). Pragmatically, you can script something that e.g. does cat path/to/keyspace/* /dev/null or similar. But that only works if the total database size fits reasonably well in page cache. Pre-heating sstables on a per-cf basis on start-up would be a nice feature to have.
Re: Strange readRepairChance in server logs
Thanks Aaron! El mar, 12-04-2011 a las 23:52 +1200, aaron morton escribió: Bug in the CLI, created / fixed https://issues.apache.org/jira/browse/CASSANDRA-2458 use 70 for now. Thanks Aaron On 12 Apr 2011, at 20:46, Héctor Izquierdo Seliva wrote: Hi everyone. I've changed the read repair chance of one of my column families from cassandra-cli with the following entry: update column family cf with read_repair_chance = 0.7 I expected to see in the server log readRepairChance=0.7 Instead I saw this readRepairChance=0.006999, Should I use read_repair_chance = 70 instead of 0.7?
Cassandra monitoring tool
Hi everyone. Looking for ways to monitor cassandra with zabbix I could not found anything that was really usable, till I found mention of a nice class by smeet. I have based my modification upon his work and now I give it back to the community. Here's the project url: http://code.google.com/p/simple-cassandra-monitoring/ It allows to get statistics for any Keyspace/ColumnFamily you want. To start it just build the jar, and launch it using as classpath your cassandra installation lib folder. The first parameter is the node host name. The second parameter is a comma separated list of KS:CF values. For example: java -cp blablabla localhost ks1:cf1,ks1:cf2. Then point curl to http://localhost:9090/ks1/cf1 and some basic stats will be displayed. You can also point to http://localhost:9090/nodeinfo to get some info about the server. If you have any suggestion or improvement you would like to see, please contact me and I will be glad to work on it. Right now it's a bit rough, but it gets the job done. Thanks for your time!
Re: Cassandra monitoring tool
El mar, 12-04-2011 a las 21:24 +0500, Ali Ahsan escribió: Thanks for sharing this info,I am getting following error,Can please be more specific how can i run this java -cp /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar 127.0.0.1 ks1:cf1,ks1:cf2 Exception in thread main java.lang.NoClassDefFoundError: 127/0/0/1 Caused by: java.lang.ClassNotFoundException: 127.0.0.1 at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: 127.0.0.1. Program will exit. OR java -jar /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar localhost ks1:cf1,ks1:cf2 Failed to load Main-Class manifest attribute from /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar Hi Ali. You should run it like this java -cp /home/ali/apache-cassandra-0.6.3/lib/* com.google.code.scm.CassandraMonitoring localhost ks1:cf1,ks2:cf2,etc I forgot to mention it has been coded against 0.7.x, and I'm not sure it will work on 0.6.x. I'll try to add support for both 0.6.x and the new 0.8.x version as soon as possible. On 04/12/2011 07:26 PM, Héctor Izquierdo Seliva wrote: Hi everyone. Looking for ways to monitor cassandra with zabbix I could not found anything that was really usable, till I found mention of a nice class by smeet. I have based my modification upon his work and now I give it back to the community. Here's the project url: http://code.google.com/p/simple-cassandra-monitoring/ It allows to get statistics for any Keyspace/ColumnFamily you want. To start it just build the jar, and launch it using as classpath your cassandra installation lib folder. The first parameter is the node host name. The second parameter is a comma separated list of KS:CF values. For example: java -cp blablabla localhost ks1:cf1,ks1:cf2. Then point curl to http://localhost:9090/ks1/cf1 and some basic stats will be displayed. You can also point to http://localhost:9090/nodeinfo to get some info about the server. If you have any suggestion or improvement you would like to see, please contact me and I will be glad to work on it. Right now it's a bit rough, but it gets the job done. Thanks for your time!
Re: Cassandra monitoring tool
I'm not sure. Are you runing it in the same host as the cassandra node? El mar, 12-04-2011 a las 22:54 +0500, Ali Ahsan escribió: On 04/12/2011 10:42 PM, Héctor Izquierdo Seliva wrote: I forgot to mention it has been coded against 0.7.x, and I'm not sure it will work on 0.6.x. I'll try to add support for both 0.6.x and the new 0.8.x version as soon as possible. I think these error is because of 0.6.3 ? xception in thread main java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) at com.google.code.scm.CassandraMonitoring.start(CassandraMonitoring.java:58) at com.google.code.scm.CassandraMonitoring.main(CassandraMonitoring.java:190) Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203) at javax.naming.InitialContext.lookup(InitialContext.java:409) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1902) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1871) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:276) ... 3 more Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:304) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:114) ... 8 more Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:246) ... 12 more
Re: RE: batch_mutate failed: out of sequence response
El mié, 06-04-2011 a las 21:04 -0500, Jonathan Ellis escribió: out of sequence response is thrift's way of saying I got a response for request Y when I expected request X. my money is on using a single connection from multiple threads. don't do that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was instead of being dropped by the server?
Re: RE: batch_mutate failed: out of sequence response
El mié, 06-04-2011 a las 21:04 -0500, Jonathan Ellis escribió: out of sequence response is thrift's way of saying I got a response for request Y when I expected request X. my money is on using a single connection from multiple threads. don't do that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out?
Re: Disable Swap? batch_mutate failed: out of sequence response
I took a look at vmstats, and there was no swap. Also, our monitoring tools showed no swap being used at all. It's running with mlockall and all that. 8GB heap on a 16GB machine El mar, 05-04-2011 a las 21:24 +0200, Peter Schuller escribió: Would you recommend to disable system swap as a rule? I'm running on Debian 64bit and am seeing light swapping: I'm not Jonathan, but *yes*. I would go so far as to say that disabling swap is a good rule of thumb for *most* production systems that serve latency sensitive traffic. For a machine dedicated to Cassandra, you definitely want to disable swap. There's just nothing to be gained really, but lots to loose. There is nothing that you *want* swapped out. All the memory you use needs to be in memory. You don't want the heap swapped out, you don't want the off-heap jvm malloc stuff swapped out, you don't want stacks swapped out,etc. As soon as you start swapping you very quickly run into poor and unreliable performance. Particularly during GC.
Re: RE: batch_mutate failed: out of sequence response
El mié, 06-04-2011 a las 09:06 +1000, Dan Washusen escribió: Pelops raises a RuntimeException? Can you provide more info please? org.scale7.cassandra.pelops.exceptions.ApplicationException: batch_mutate failed: out of sequence response -- Dan Washusen Make big files fly visit digitalpigeon.com On Tuesday, 5 April 2011 at 11:43 PM, Héctor Izquierdo Seliva wrote: El mar, 05-04-2011 a las 09:35 -0400, Dan Hendry escribió: I too have seen the out of sequence response problem. My solution has just been to retry and it seems to work. None of my mutations are THAT large ( 200 columns). The only related information I could find points to a thrift/ubuntu bug of some kind (http://markmail.org/message/xc3tskhhvsf5awz7). What OS are you running? Dan Hi Dan. I'm running on Debian stable and cassandra 0.7.4. I have rows with up to 1000 columns. I have changed the way I was doing the batch mutates to never be bigger than 100 columns at a time. I hope this will work, otherwise the move is going to take too long. The problem is aggravated by Pelops not retrying automatically and instead raising a RuntimeException. I'll try to add a retry if this doesn't work. Thanks for your response! Héctor -Original Message- From: Héctor Izquierdo Seliva [mailto:izquie...@strands.com] Sent: April-05-11 8:30 To: user@cassandra.apache.org Subject: batch_mutate failed: out of sequence response Hi everyone. I'm having trouble while inserting big amounts of data into cassandra. I'm getting this exception: batch_mutate failed: out of sequence response I'm gessing is due to very big mutates. I have made the batch mutates smaller and it seems to be behaving. Can somebody shed some light? Thanks! No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.894 / Virus Database: 271.1.1/3551 - Release Date: 04/05/11 02:34:00
Re: Disable Swap? batch_mutate failed: out of sequence response
El mié, 06-04-2011 a las 09:18 +0200, Héctor Izquierdo Seliva escribió: I took a look at vmstats, and there was no swap. Also, our monitoring tools showed no swap being used at all. It's running with mlockall and all that. 8GB heap on a 16GB machine I tried disabling swap completely, and the problem is still appearing. El mar, 05-04-2011 a las 21:24 +0200, Peter Schuller escribió: Would you recommend to disable system swap as a rule? I'm running on Debian 64bit and am seeing light swapping: I'm not Jonathan, but *yes*. I would go so far as to say that disabling swap is a good rule of thumb for *most* production systems that serve latency sensitive traffic. For a machine dedicated to Cassandra, you definitely want to disable swap. There's just nothing to be gained really, but lots to loose. There is nothing that you *want* swapped out. All the memory you use needs to be in memory. You don't want the heap swapped out, you don't want the off-heap jvm malloc stuff swapped out, you don't want stacks swapped out,etc. As soon as you start swapping you very quickly run into poor and unreliable performance. Particularly during GC.
batch_mutate failed: out of sequence response
Hi everyone. I'm having trouble while inserting big amounts of data into cassandra. I'm getting this exception: batch_mutate failed: out of sequence response I'm gessing is due to very big mutates. I have made the batch mutates smaller and it seems to be behaving. Can somebody shed some light? Thanks!
RE: batch_mutate failed: out of sequence response
El mar, 05-04-2011 a las 09:35 -0400, Dan Hendry escribió: I too have seen the out of sequence response problem. My solution has just been to retry and it seems to work. None of my mutations are THAT large ( 200 columns). The only related information I could find points to a thrift/ubuntu bug of some kind (http://markmail.org/message/xc3tskhhvsf5awz7). What OS are you running? Dan Hi Dan. I'm running on Debian stable and cassandra 0.7.4. I have rows with up to 1000 columns. I have changed the way I was doing the batch mutates to never be bigger than 100 columns at a time. I hope this will work, otherwise the move is going to take too long. The problem is aggravated by Pelops not retrying automatically and instead raising a RuntimeException. I'll try to add a retry if this doesn't work. Thanks for your response! Héctor -Original Message- From: Héctor Izquierdo Seliva [mailto:izquie...@strands.com] Sent: April-05-11 8:30 To: user@cassandra.apache.org Subject: batch_mutate failed: out of sequence response Hi everyone. I'm having trouble while inserting big amounts of data into cassandra. I'm getting this exception: batch_mutate failed: out of sequence response I'm gessing is due to very big mutates. I have made the batch mutates smaller and it seems to be behaving. Can somebody shed some light? Thanks! No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.894 / Virus Database: 271.1.1/3551 - Release Date: 04/05/11 02:34:00
RE: batch_mutate failed: out of sequence response
I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. Is any of this wrong? -Original Message- From: Héctor Izquierdo Seliva [mailto:izquie...@strands.com] Sent: April-05-11 8:30 To: user@cassandra.apache.org Subject: batch_mutate failed: out of sequence response Hi everyone. I'm having trouble while inserting big amounts of data into cassandra. I'm getting this exception: batch_mutate failed: out of sequence response I'm gessing is due to very big mutates. I have made the batch mutates smaller and it seems to be behaving. Can somebody shed some light? Thanks! No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.894 / Virus Database: 271.1.1/3551 - Release Date: 04/05/11 02:34:00
RE: batch_mutate failed: out of sequence response
Update with more info: I'm still running into problems. Now I don't write more than 100 columns at a time, and I'm having lots of Stop-the-world gc pauses. I'm writing into three column families, with memtable_operations = 0.3 and memtable_throughput = 64. There is now swapping, and full GCs are taking around 5 seconds. I'm running cassandra with a heap of 8 GB. Should I tune this somehow? Is any of this wrong? -Original Message- From: Héctor Izquierdo Seliva [mailto:izquie...@strands.com] Sent: April-05-11 8:30 To: user@cassandra.apache.org Subject: batch_mutate failed: out of sequence response Hi everyone. I'm having trouble while inserting big amounts of data into cassandra. I'm getting this exception: batch_mutate failed: out of sequence response I'm gessing is due to very big mutates. I have made the batch mutates smaller and it seems to be behaving. Can somebody shed some light? Thanks! No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.894 / Virus Database: 271.1.1/3551 - Release Date: 04/05/11 02:34:00
Re: How to use NetworkTopologyStrategy
Thanks! I totally overlooked that. El lun, 21-02-2011 a las 08:14 +1300, Aaron Morton escribió: The best examples I know of are in the internal cli help, and conf/casandra.yaml Aaron On 19/02/2011, at 12:51 AM, Héctor Izquierdo Seliva izquie...@strands.com wrote: Hi! Can some body give me some hints about how to configure a keyspace with NetworkTopologyStrategy via cassandra-cli? Or what is the preferred method to do so? Thanks!
Replicate changes from DC1 to DC2, but not from DC2 to DC1
Hi all. Is there a way (besides changing the code) to replicate data from a Data center 1 to a Data center 2, but not the other way around? I need to have a preproduction environment with production data, and ideally with only a fraction of the data (for example, by key preffixes). I have poked around StorageProxy and I can make writes in DC2 not replicate to DC1, and as long as I use DC_QUORUM it stays that way, but it looks...dangerous. I could do a full key scan but it would take too long. Have anybody done something similar? Thanks!
millions of columns in a row vs millions of rows with one column
Hi Everyone. I'm testing performance differences of millions of columns in a row vs millions of rows. So far it seems wide rows perform better in terms of reads, but there can be potentially hundreds of millions of columns in a row. Is this going to be a problem? Should I go with individual rows? I run 6 nodes with 7.2 and a RF=3. Thanks for your help!
Re: Replicate changes from DC1 to DC2, but not from DC2 to DC1
El mar, 22-02-2011 a las 08:46 +1300, Aaron Morton escribió: Take a look at the NetworkTopologyStrategy and/or the RackInferringSnitch together they decide where to place replicas. It's probably not a great idea to muck around with this stuff though. How about a hadoop job to pull out the data you want? It would be a full scan but in parallel. I looked at that, but correct me If i'm wrong, schema changes are distributed to all nodes, and they all have to agree on a version, so I can't have a keyspace A in DC1 with NetworkTopologyStrategy with options = [{DC1:1,DC2:1}] and the same keyspace in DC2 with options [{DC2:1, DC1:0}]. Is that correct? Aaron On 22/02/2011, at 3:10 AM, Héctor Izquierdo Seliva izquie...@strands.com wrote: Hi all. Is there a way (besides changing the code) to replicate data from a Data center 1 to a Data center 2, but not the other way around? I need to have a preproduction environment with production data, and ideally with only a fraction of the data (for example, by key preffixes). I have poked around StorageProxy and I can make writes in DC2 not replicate to DC1, and as long as I use DC_QUORUM it stays that way, but it looks...dangerous. I could do a full key scan but it would take too long. Have anybody done something similar? Thanks!
Re: millions of columns in a row vs millions of rows with one column
El mar, 22-02-2011 a las 08:49 +1300, Aaron Morton escribió: My preference is to go with more rows as it distributes load better. But the best design is the one that supports your read patterns. See http://wiki.apache.org/cassandra/LargeDataSetConsiderations for background. Aaron those rows are distributed among the three replicas, so my thought was that I could get away with it and have a more or less balanced cluster. Anyway, the columns I read are not contiguous, so then the effect in I/O is the same as having individual rows right? Cassandra still has to seek to the position of the columns within the row. How much space does the key cache uses per row? This would make the number of rows increase by a big factor. On 22/02/2011, at 3:56 AM, Héctor Izquierdo Seliva izquie...@strands.com wrote: Hi Everyone. I'm testing performance differences of millions of columns in a row vs millions of rows. So far it seems wide rows perform better in terms of reads, but there can be potentially hundreds of millions of columns in a row. Is this going to be a problem? Should I go with individual rows? I run 6 nodes with 7.2 and a RF=3. Thanks for your help!
How to use NetworkTopologyStrategy
Hi! Can some body give me some hints about how to configure a keyspace with NetworkTopologyStrategy via cassandra-cli? Or what is the preferred method to do so? Thanks!
Question about fat rows
Hi everyone. I have a question about data modeling in my application. I have to store items of a customer, and I can do it in one fat row per customer where the column name is the id and the value a json serialized object, or one entry per item with the same layout. This data is updated almost every day, sometimes several times per day. My question is, which scheme will give me a better read performance? I was hoping on saving keys so I could cache all the keys in this CF, but I'm worried about read performance with very updated fat rows. Any help or hints would be appreciated. Thanks!
Re: Cassandra 0.7.0rc1 issue with command-cli
Try ending the lines with ; Regards El vie, 26-11-2010 a las 21:25 +1100, jasonmp...@gmail.com escribió: Hi, So I had this working perfectly with beta 3 and now it fails. Basically what I do is follows: 1) Extract new rc1 tarball. 2) Prepare location based on instructions in Readme.txt: sudo rm -r /var/log/cassandra sudo rm -r /var/lib/cassandra sudo mkdir -p /var/log/cassandra sudo chown -R `whoami` /var/log/cassandra sudo mkdir -p /var/lib/cassandra sudo chown -R `whoami` /var/lib/cassandra 3) Then run cassandra [devel...@localhost apache-cassandra-0.7.0-rc1]$ bin/cassandra -f INFO 21:23:41,750 Heap size: 1060569088/1061617664 INFO 21:23:41,755 JNA not found. Native methods will be disabled. INFO 21:23:41,767 Loading settings from file:/opt/apache-cassandra-0.7.0-rc1/conf/cassandra.yaml INFO 21:23:41,942 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard INFO 21:23:42,055 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1290767022055.log INFO 21:23:42,129 read 0 from saved key cache INFO 21:23:42,132 read 0 from saved key cache INFO 21:23:42,138 read 0 from saved key cache INFO 21:23:42,142 read 0 from saved key cache INFO 21:23:42,143 read 0 from saved key cache INFO 21:23:42,147 loading row cache for LocationInfo of system INFO 21:23:42,164 completed loading (16 ms; 0 keys) row cache for LocationInfo of system INFO 21:23:42,164 loading row cache for HintsColumnFamily of system INFO 21:23:42,165 completed loading (1 ms; 0 keys) row cache for HintsColumnFamily of system INFO 21:23:42,165 loading row cache for Migrations of system INFO 21:23:42,166 completed loading (1 ms; 0 keys) row cache for Migrations of system INFO 21:23:42,168 loading row cache for Schema of system INFO 21:23:42,168 completed loading (0 ms; 0 keys) row cache for Schema of system INFO 21:23:42,168 loading row cache for IndexInfo of system INFO 21:23:42,169 completed loading (1 ms; 0 keys) row cache for IndexInfo of system INFO 21:23:42,257 Couldn't detect any schema definitions in local storage. INFO 21:23:42,258 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). INFO 21:23:42,260 No commitlog files found; skipping replay INFO 21:23:42,301 Upgrading to 0.7. Purging hints if there are any. Old hints will be snapshotted. INFO 21:23:42,306 Cassandra version: 0.7.0-rc1 INFO 21:23:42,306 Thrift API version: 19.4.0 INFO 21:23:42,320 Loading persisted ring state INFO 21:23:42,338 Starting up server gossip INFO 21:23:42,365 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1290767022055.log', position=700) INFO 21:23:42,367 Enqueuing flush of memtable-locationi...@14361585(227 bytes, 4 operations) INFO 21:23:42,367 Writing memtable-locationi...@14361585(227 bytes, 4 operations) INFO 21:23:42,796 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-1-Data.db (473 bytes) WARN 21:23:42,861 Generated random token 124937963426514930245885291999748186719. Random tokens will result in an unbalanced ring; see http://wiki.apache.org/cassandra/Operations INFO 21:23:42,863 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1290767022055.log', position=996) INFO 21:23:42,863 Enqueuing flush of memtable-locationi...@17243268(53 bytes, 2 operations) INFO 21:23:42,864 Writing memtable-locationi...@17243268(53 bytes, 2 operations) INFO 21:23:43,277 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-2-Data.db (301 bytes) INFO 21:23:43,282 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 21:23:43,347 Binding thrift service to localhost/127.0.0.1:9160 INFO 21:23:43,350 Using TFramedTransport with a max frame size of 15728640 bytes. INFO 21:23:43,353 Listening for thrift clients... 4) start the command line client: [devel...@localhost apache-cassandra-0.7.0-rc1]$ bin/cassandra-cli Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. [defa...@unknown] connect localhost/9160 And as soon as I try and use the connect localhost/9160 it stalls I did this EXACT same procedure with beta3 and no issues. I am running on centos 5.5 with java 6 Any ideas?
Re: Cassandra 0.7.0rc1 issue with command-cli
Yes, I know what you mean. I bashed my head a few times against the keyboard :) El vie, 26-11-2010 a las 21:37 +1100, Jason Pell escribió: It works perfectly, thanks so much, I was finding it a little frustrating. On Fri, Nov 26, 2010 at 9:36 PM, Jason Pell ja...@pellcorp.com wrote: no way - well I certainly feel stupid! Is this new, it worked without it on beta 3? 2010/11/26 Héctor Izquierdo Seliva izquie...@strands.com: Try ending the lines with ; Regards El vie, 26-11-2010 a las 21:25 +1100, jasonmp...@gmail.com escribió: Hi, So I had this working perfectly with beta 3 and now it fails. Basically what I do is follows: 1) Extract new rc1 tarball. 2) Prepare location based on instructions in Readme.txt: sudo rm -r /var/log/cassandra sudo rm -r /var/lib/cassandra sudo mkdir -p /var/log/cassandra sudo chown -R `whoami` /var/log/cassandra sudo mkdir -p /var/lib/cassandra sudo chown -R `whoami` /var/lib/cassandra 3) Then run cassandra [devel...@localhost apache-cassandra-0.7.0-rc1]$ bin/cassandra -f INFO 21:23:41,750 Heap size: 1060569088/1061617664 INFO 21:23:41,755 JNA not found. Native methods will be disabled. INFO 21:23:41,767 Loading settings from file:/opt/apache-cassandra-0.7.0-rc1/conf/cassandra.yaml INFO 21:23:41,942 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard INFO 21:23:42,055 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1290767022055.log INFO 21:23:42,129 read 0 from saved key cache INFO 21:23:42,132 read 0 from saved key cache INFO 21:23:42,138 read 0 from saved key cache INFO 21:23:42,142 read 0 from saved key cache INFO 21:23:42,143 read 0 from saved key cache INFO 21:23:42,147 loading row cache for LocationInfo of system INFO 21:23:42,164 completed loading (16 ms; 0 keys) row cache for LocationInfo of system INFO 21:23:42,164 loading row cache for HintsColumnFamily of system INFO 21:23:42,165 completed loading (1 ms; 0 keys) row cache for HintsColumnFamily of system INFO 21:23:42,165 loading row cache for Migrations of system INFO 21:23:42,166 completed loading (1 ms; 0 keys) row cache for Migrations of system INFO 21:23:42,168 loading row cache for Schema of system INFO 21:23:42,168 completed loading (0 ms; 0 keys) row cache for Schema of system INFO 21:23:42,168 loading row cache for IndexInfo of system INFO 21:23:42,169 completed loading (1 ms; 0 keys) row cache for IndexInfo of system INFO 21:23:42,257 Couldn't detect any schema definitions in local storage. INFO 21:23:42,258 Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). INFO 21:23:42,260 No commitlog files found; skipping replay INFO 21:23:42,301 Upgrading to 0.7. Purging hints if there are any. Old hints will be snapshotted. INFO 21:23:42,306 Cassandra version: 0.7.0-rc1 INFO 21:23:42,306 Thrift API version: 19.4.0 INFO 21:23:42,320 Loading persisted ring state INFO 21:23:42,338 Starting up server gossip INFO 21:23:42,365 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1290767022055.log', position=700) INFO 21:23:42,367 Enqueuing flush of memtable-locationi...@14361585(227 bytes, 4 operations) INFO 21:23:42,367 Writing memtable-locationi...@14361585(227 bytes, 4 operations) INFO 21:23:42,796 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-1-Data.db (473 bytes) WARN 21:23:42,861 Generated random token 124937963426514930245885291999748186719. Random tokens will result in an unbalanced ring; see http://wiki.apache.org/cassandra/Operations INFO 21:23:42,863 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1290767022055.log', position=996) INFO 21:23:42,863 Enqueuing flush of memtable-locationi...@17243268(53 bytes, 2 operations) INFO 21:23:42,864 Writing memtable-locationi...@17243268(53 bytes, 2 operations) INFO 21:23:43,277 Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-2-Data.db (301 bytes) INFO 21:23:43,282 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 21:23:43,347 Binding thrift service to localhost/127.0.0.1:9160 INFO 21:23:43,350 Using TFramedTransport with a max frame size of 15728640 bytes. INFO 21:23:43,353 Listening for thrift clients... 4) start the command line client: [devel...@localhost apache-cassandra-0.7.0-rc1]$ bin/cassandra-cli Welcome to cassandra CLI. Type 'help' or '?' for help. Type 'quit' or 'exit' to quit. [defa...@unknown] connect localhost/9160 And as soon as I try and use the connect localhost/9160 it stalls I did this EXACT same procedure with beta3 and no issues. I am running on centos 5.5 with java 6 Any ideas?
Re: cassandra-cli no command working - mac osx
That happened to me too. Try with a ; at the end of the line. El jue, 25-11-2010 a las 17:22 +, Marcin escribió: Hi guys, I am having weird problem, cassandra is working but can't get cassandra-cli to work. When I run command - any command like even help and hit error I am not getting any response any ideas? P.S. Running Cassandra 0.7.0 RC1 cheers, /Marcin
Wide rows or tons of rows?
Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read that in cassandra wide rows are better than a lot of rows, but then I face two problems. First, column distribution. The only way I can think of distributing items among a given set of rows is hashing the item id to a row id, and the using the item id as the column name. In this way, I can distribute data among a few rows evenly, but If there are only a few items it's equivalent to a row per item plus more overhead, and if there are millions of items then the rows are to big, and I have to turn off row cache. Does anybody knows a way around this? The second issue is that in my benchmarks, once the data is mmapped, one item per row performs faster than wide rows by a significant margin. Is this how it is supposed to be? I can give additional data if needed. English is not my first language so I apologize beforehand is some of this doesn't make sense. Thanks for your time
Re: Wide rows or tons of rows?
El lun, 11-10-2010 a las 11:08 -0400, Edward Capriolo escribió: Inlined: 2010/10/11 Héctor Izquierdo Seliva izquie...@strands.com: Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read that in cassandra wide rows are better than a lot of rows, but then I face two problems. First, column distribution. The only way I can think of distributing items among a given set of rows is hashing the item id to a row id, and the using the item id as the column name. In this way, I can distribute data among a few rows evenly, but If there are only a few items it's equivalent to a row per item plus more overhead, and if there are millions of items then the rows are to big, and I have to turn off row cache. Does anybody knows a way around this? The second issue is that in my benchmarks, once the data is mmapped, one item per row performs faster than wide rows by a significant margin. Is this how it is supposed to be? I can give additional data if needed. English is not my first language so I apologize beforehand is some of this doesn't make sense. Thanks for your time If you have wide rows RowCache is a problem. IMHO RowCache is only viable in situations where you have a fixed amount of data and thus will get a high hit rate. I was running a large row cache for some time and I found it unpredictable. It causes memory pressure on the JVM from moving things in and out of memory, and if the hit rate is low taking a key and all its columns in and out repeatedly ends up being counter productive for disk utilization. Suggest KeyCache in most situations, (there is a ticket opened for a fractional row cache) I saw the same behavior. It's a pity there is not a column cache. That would be awesome. Another factor to consider is if you have many rows and many columns you end up with large (er) indexes. In our case we have start up times slightly longer then we would like because the process of sampling indexes during start up is intensive. If I could do it all over again I might serialize more into single columns rather then exploding data across multiple rows and columns. If you always need to look up the entire row do not break it down by columns. So it might be better to store a json serialized version then? I was using SuperColumns to store item info, but a simple string might give me the option to do some compression. memory mapping. There are different dynamics depending on data size relative to memory size. You may have something like ~ 40GB of data and 10GB index, 32GB RAM a node, this system is not going to respond the same way with say 200GB data 25 GB Indexes. Also it is very workload dependent. We have a 6 node cluster with 16 GB RAM each, although the whole dataset is expected to be around 100GB per machine. Which indexes are more expensive, row or column indexes? Hope this helps, Edward It does!