Partially, I guess this may be a side effect of multithreaded compactions? Before running out of space completely, I do see a few of these: WARN [CompactionExecutor:448] 2011-05-02 01:08:10,480 CompactionManager.java (line 516) insufficient space to compact all requested files SSTableReader(path='/data/cassandra/JP_MALL_P H/Order-f-12858-Data.db'), SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12851-Data.db'), SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12864-Data.db') INFO [CompactionExecutor:448] 2011-05-02 01:08:10,481 StorageService.java (line 2066) requesting GC to free disk space
In this case, there would be 24 threads that asked if there was empty disk space. Most of them probably succeeded in that request, but they could have requested 24x available space in theory since I do not think there is any global pool of used disk in place that manages which how much disk space will be needed for already started compactions? Of course, regardless how much checking there is in advance, we could still run out of disk, so I guess there is also a need for checking if diskspace is about to run out while compaction runs so things may be halted/aborted. Unfortunately that would need global coordination so we do not stop all compaction threads.... After reducing to 6 compaction threads in 0.8 beta2, the data has compacted just fine without any disk space issues, so I guess another problem you may hit as you get a lot of sstables which have updates (that is, duplicates) to the same data, is that of course, the massively concurrent compaction taking place with nproc threads could also concurrently duplicate all the duplicates on a large scale. Yes, this is in favour of multithreaded compaction as it should normally help keeping sstables to a sane level and avoid such problems, but it is unfortunately just a kludge to the real problem which is to segment the sstables somehow on keyspace so we can get down the disk requirements and recover from scenarios where disk gets above 50%. Regards, Terje On Wed, May 4, 2011 at 3:33 PM, Terje Marthinussen <tmarthinus...@gmail.com>wrote: > Well, just did not look at these logs very well at all last night > First out of disk message: > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 > AbstractCassandraDaemon.java (line 112) Fatal exception in thread > Thread[CompactionExecutor:387,1,main] > java.io.IOException: No space left on device > > Then finally the last one > ERROR [FlushWriter:128] 2011-05-02 01:51:06,112 > AbstractCassandraDaemon.java (line 112) Fatal exception in thread > Thread[FlushWriter:128,5,main] > java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk > space to flush 554962 bytes > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: Insufficient disk space to flush > 554962 bytes > at > org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597) > at > org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100) > at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239) > at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50) > at > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > INFO [CompactionExecutor:451] 2011-05-02 01:51:06,113 StorageService.java > (line 2066) requesting GC to free disk space > [lots of sstables deleted] > > After this is starts running again (although not fine it seems). > > So the disk seems to have been full for 35 minutes due to un-deleted > sstables. > > Terje > > On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen < > tmarthinus...@gmail.com> wrote: > >> Hm... peculiar. >> >> Post flush is not involved in compactions, right? >> >> May 2nd >> 01:06 - Out of disk >> 01:51 - Starts a mix of major and minor compactions on different column >> families >> It then starts a few minor compactions extra over the day, but given that >> there are more than 1000 sstables, and we are talking 3 minor compactions >> started, it is not normal I think. >> May 3rd 1 minor compaction started. >> >> When I checked today, there was a bunch of tmp files on the disk with last >> modify time from 01:something on may 2nd and 200GB empty disk... >> >> Definitely no compaction going on. >> Guess I will add some debug logging and see if I get lucky and run out of >> disk again. >> >> Terje >> >> On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >>> Compaction does, but flush didn't until >>> https://issues.apache.org/jira/browse/CASSANDRA-2404 >>> >>> On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen >>> <tmarthinus...@gmail.com> wrote: >>> > Yes, I realize that. >>> > I am bit curious why it ran out of disk, or rather, why I have 200GB >>> empty >>> > disk now, but unfortunately it seems like we may not have had >>> monitoring >>> > enabled on this node to tell me what happened in terms of disk usage. >>> > I also thought that compaction was supposed to resume (try again with >>> less >>> > data) if it fails? >>> > Terje >>> > >>> > On Wed, May 4, 2011 at 3:50 AM, Jonathan Ellis <jbel...@gmail.com> >>> wrote: >>> >> >>> >> post flusher is responsible for updating commitlog header after a >>> >> flush; each task waits for a specific flush to complete, then does its >>> >> thing. >>> >> >>> >> so when you had a flush catastrophically fail, its corresponding >>> >> post-flush task will be stuck. >>> >> >>> >> On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen >>> >> <tmarthinus...@gmail.com> wrote: >>> >> > Just some very tiny amount of writes in the background here (some >>> hints >>> >> > spooled up on another node slowly coming in). >>> >> > No new data. >>> >> > >>> >> > I thought there was no exceptions, but I did not look far enough >>> back in >>> >> > the >>> >> > log at first. >>> >> > Going back a bit further now however, I see that about 50 hours ago: >>> >> > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 >>> >> > AbstractCassandraDaemon.java (line 112) Fatal exception in thread >>> >> > Thread[CompactionExecutor:387,1,main] >>> >> > java.io.IOException: No space left on device >>> >> > at java.io.RandomAccessFile.writeBytes(Native Method) >>> >> > at java.io.RandomAccessFile.write(RandomAccessFile.java:466) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:160) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:225) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:356) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:335) >>> >> > at >>> >> > >>> org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:130) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:566) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:146) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:112) >>> >> > at >>> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> >> > at >>> >> > >>> >> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> >> > at >>> >> > >>> >> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> >> > at java.lang.Thread.run(Thread.java:662) >>> >> > [followed by a few more of those...] >>> >> > and then a bunch of these: >>> >> > ERROR [FlushWriter:123] 2011-05-02 01:21:12,690 >>> >> > AbstractCassandraDaemon.java >>> >> > (line 112) Fatal exception in thread Thread[FlushWriter:123,5,main] >>> >> > java.lang.RuntimeException: java.lang.RuntimeException: Insufficient >>> >> > disk >>> >> > space to flush 40009184 bytes >>> >> > at >>> >> > >>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>> >> > at >>> >> > >>> >> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> >> > at >>> >> > >>> >> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> >> > at java.lang.Thread.run(Thread.java:662) >>> >> > Caused by: java.lang.RuntimeException: Insufficient disk space to >>> flush >>> >> > 40009184 bytes >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597) >>> >> > at >>> >> > >>> >> > >>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100) >>> >> > at >>> >> > >>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239) >>> >> > at >>> org.apache.cassandra.db.Memtable.access$400(Memtable.java:50) >>> >> > at >>> >> > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263) >>> >> > at >>> >> > >>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>> >> > ... 3 more >>> >> > Seems like compactions stopped after this (a bunch of tmp tables >>> there >>> >> > still >>> >> > from when those errors where generated), and I can only suspect the >>> post >>> >> > flusher may have stopped at the same time. >>> >> > There is 890GB of disk for data, sstables are currently using 604G >>> >> > (139GB is >>> >> > old tmp tables from when it ran out of disk) and "ring" tells me the >>> >> > load on >>> >> > the node is 313GB. >>> >> > Terje >>> >> > >>> >> > >>> >> > On Wed, May 4, 2011 at 3:02 AM, Jonathan Ellis <jbel...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> ... and are there any exceptions in the log? >>> >> >> >>> >> >> On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis <jbel...@gmail.com> >>> >> >> wrote: >>> >> >> > Does it resolve down to 0 eventually if you stop doing writes? >>> >> >> > >>> >> >> > On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen >>> >> >> > <tmarthinus...@gmail.com> wrote: >>> >> >> >> Cassandra 0.8 beta trunk from about 1 week ago: >>> >> >> >> Pool Name Active Pending Completed >>> >> >> >> ReadStage 0 0 5 >>> >> >> >> RequestResponseStage 0 0 87129 >>> >> >> >> MutationStage 0 0 187298 >>> >> >> >> ReadRepairStage 0 0 0 >>> >> >> >> ReplicateOnWriteStage 0 0 0 >>> >> >> >> GossipStage 0 0 1353524 >>> >> >> >> AntiEntropyStage 0 0 0 >>> >> >> >> MigrationStage 0 0 10 >>> >> >> >> MemtablePostFlusher 1 190 108 >>> >> >> >> StreamStage 0 0 0 >>> >> >> >> FlushWriter 0 0 302 >>> >> >> >> FILEUTILS-DELETE-POOL 0 0 26 >>> >> >> >> MiscStage 0 0 0 >>> >> >> >> FlushSorter 0 0 0 >>> >> >> >> InternalResponseStage 0 0 0 >>> >> >> >> HintedHandoff 1 4 7 >>> >> >> >> >>> >> >> >> Anyone with nice theories about the pending value on the >>> memtable >>> >> >> >> post >>> >> >> >> flusher? >>> >> >> >> Regards, >>> >> >> >> Terje >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Jonathan Ellis >>> >> >> > Project Chair, Apache Cassandra >>> >> >> > co-founder of DataStax, the source for professional Cassandra >>> support >>> >> >> > http://www.datastax.com >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Jonathan Ellis >>> >> >> Project Chair, Apache Cassandra >>> >> >> co-founder of DataStax, the source for professional Cassandra >>> support >>> >> >> http://www.datastax.com >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Jonathan Ellis >>> >> Project Chair, Apache Cassandra >>> >> co-founder of DataStax, the source for professional Cassandra support >>> >> http://www.datastax.com >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> >> >