Re: MemtablePostFlusher with high number of pending calls?

Terje Marthinussen Wed, 04 May 2011 00:32:34 -0700

Partially, I guess this may be a side effect of multithreaded compactions?

Before running out of space completely, I do see a few of these:
 WARN [CompactionExecutor:448] 2011-05-02 01:08:10,480
CompactionManager.java (line 516) insufficient space to compact all
requested files SSTableReader(path='/data/cassandra/JP_MALL_P
H/Order-f-12858-Data.db'),
SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12851-Data.db'),
SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12864-Data.db')
 INFO [CompactionExecutor:448] 2011-05-02 01:08:10,481 StorageService.java
(line 2066) requesting GC to free disk space


In this case, there would be 24 threads that asked if there was empty disk
space.

Most of them probably succeeded in that request, but they could have
requested 24x available space in theory since I do not think there is any
global pool of used disk in place that manages which how much disk space
will be needed for already started compactions?

Of course, regardless how much checking there is in advance, we could still
run out of disk, so I guess there is also a need for checking if diskspace
is about to run out while compaction runs so things may be halted/aborted.
Unfortunately that would need global coordination so we do not stop all
compaction threads....

After reducing to 6 compaction threads in 0.8 beta2, the data has compacted
just fine without any disk space issues, so I guess another problem you may
hit as you get a lot of sstables which have updates (that is, duplicates) to
the same data, is that of course, the massively concurrent compaction taking
place with nproc threads could also concurrently duplicate all the
duplicates on a large scale.

Yes, this is in favour of multithreaded compaction as it should normally
help keeping sstables to a sane level and avoid such problems, but it is
unfortunately just a kludge to the real problem which is to segment the
sstables somehow on keyspace so we can get down the disk requirements and
recover from scenarios where disk gets above 50%.

Regards,
Terje



On Wed, May 4, 2011 at 3:33 PM, Terje Marthinussen
<tmarthinus...@gmail.com>wrote:

> Well, just did not look at these logs very well at all last night
> First out of disk message:
> ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027
> AbstractCassandraDaemon.java (line 112) Fatal exception in thread
> Thread[CompactionExecutor:387,1,main]
> java.io.IOException: No space left on device
>
> Then finally the last one
> ERROR [FlushWriter:128] 2011-05-02 01:51:06,112
> AbstractCassandraDaemon.java (line 112) Fatal exception in thread
> Thread[FlushWriter:128,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk
> space to flush 554962 bytes
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
> 554962 bytes
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100)
>         at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239)
>         at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50)
>         at
> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
>  INFO [CompactionExecutor:451] 2011-05-02 01:51:06,113 StorageService.java
> (line 2066) requesting GC to free disk space
> [lots of sstables deleted]
>
> After this is starts running again (although not fine it seems).
>
> So the disk seems to have been full for 35 minutes due to un-deleted
> sstables.
>
> Terje
>
> On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen <
> tmarthinus...@gmail.com> wrote:
>
>> Hm... peculiar.
>>
>> Post flush is not involved in compactions, right?
>>
>> May 2nd
>> 01:06 - Out of disk
>> 01:51 - Starts a mix of major and minor compactions on different column
>> families
>> It then starts a few minor compactions extra over the day, but given that
>> there are more than 1000 sstables, and we are talking 3 minor compactions
>> started, it is not normal I think.
>> May 3rd 1 minor compaction started.
>>
>> When I checked today, there was a bunch of tmp files on the disk with last
>> modify time from 01:something on may 2nd and 200GB empty disk...
>>
>> Definitely no compaction going on.
>> Guess I will add some debug logging and see if I get lucky and run out of
>> disk again.
>>
>> Terje
>>
>> On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>>> Compaction does, but flush didn't until
>>> https://issues.apache.org/jira/browse/CASSANDRA-2404
>>>
>>> On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen
>>> <tmarthinus...@gmail.com> wrote:
>>> > Yes, I realize that.
>>> > I am bit curious why it ran out of disk, or rather, why I have 200GB
>>> empty
>>> > disk now, but unfortunately it seems like we may not have had
>>> monitoring
>>> > enabled on this node to tell me what happened in terms of disk usage.
>>> > I also thought that compaction was supposed to resume (try again with
>>> less
>>> > data) if it fails?
>>> > Terje
>>> >
>>> > On Wed, May 4, 2011 at 3:50 AM, Jonathan Ellis <jbel...@gmail.com>
>>> wrote:
>>> >>
>>> >> post flusher is responsible for updating commitlog header after a
>>> >> flush; each task waits for a specific flush to complete, then does its
>>> >> thing.
>>> >>
>>> >> so when you had a flush catastrophically fail, its corresponding
>>> >> post-flush task will be stuck.
>>> >>
>>> >> On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen
>>> >> <tmarthinus...@gmail.com> wrote:
>>> >> > Just some very tiny amount of writes in the background here (some
>>> hints
>>> >> > spooled up on another node slowly coming in).
>>> >> > No new data.
>>> >> >
>>> >> > I thought there was no exceptions, but I did not look far enough
>>> back in
>>> >> > the
>>> >> > log at first.
>>> >> > Going back a bit further now however, I see that about 50 hours ago:
>>> >> > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027
>>> >> > AbstractCassandraDaemon.java (line 112) Fatal exception in thread
>>> >> > Thread[CompactionExecutor:387,1,main]
>>> >> > java.io.IOException: No space left on device
>>> >> >         at java.io.RandomAccessFile.writeBytes(Native Method)
>>> >> >         at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:160)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:225)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:356)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:335)
>>> >> >         at
>>> >> >
>>> org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:130)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:566)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:146)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:112)
>>> >> >         at
>>> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> >> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> >> >         at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> >> >         at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> >> >         at java.lang.Thread.run(Thread.java:662)
>>> >> > [followed by a few more of those...]
>>> >> > and then a bunch of these:
>>> >> > ERROR [FlushWriter:123] 2011-05-02 01:21:12,690
>>> >> > AbstractCassandraDaemon.java
>>> >> > (line 112) Fatal exception in thread Thread[FlushWriter:123,5,main]
>>> >> > java.lang.RuntimeException: java.lang.RuntimeException: Insufficient
>>> >> > disk
>>> >> > space to flush 40009184 bytes
>>> >> >         at
>>> >> >
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>> >> >         at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> >> >         at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> >> >         at java.lang.Thread.run(Thread.java:662)
>>> >> > Caused by: java.lang.RuntimeException: Insufficient disk space to
>>> flush
>>> >> > 40009184 bytes
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597)
>>> >> >         at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100)
>>> >> >         at
>>> >> >
>>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239)
>>> >> >         at
>>> org.apache.cassandra.db.Memtable.access$400(Memtable.java:50)
>>> >> >         at
>>> >> > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263)
>>> >> >         at
>>> >> >
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>> >> >         ... 3 more
>>> >> > Seems like compactions stopped after this (a bunch of tmp tables
>>> there
>>> >> > still
>>> >> > from when those errors where generated), and I can only suspect the
>>> post
>>> >> > flusher may have stopped at the same time.
>>> >> > There is 890GB of disk for data, sstables are currently using 604G
>>> >> > (139GB is
>>> >> > old tmp tables from when it ran out of disk) and "ring" tells me the
>>> >> > load on
>>> >> > the node is 313GB.
>>> >> > Terje
>>> >> >
>>> >> >
>>> >> > On Wed, May 4, 2011 at 3:02 AM, Jonathan Ellis <jbel...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> ... and are there any exceptions in the log?
>>> >> >>
>>> >> >> On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis <jbel...@gmail.com>
>>> >> >> wrote:
>>> >> >> > Does it resolve down to 0 eventually if you stop doing writes?
>>> >> >> >
>>> >> >> > On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen
>>> >> >> > <tmarthinus...@gmail.com> wrote:
>>> >> >> >> Cassandra 0.8 beta trunk from about 1 week ago:
>>> >> >> >> Pool Name                    Active   Pending      Completed
>>> >> >> >> ReadStage                         0         0              5
>>> >> >> >> RequestResponseStage              0         0          87129
>>> >> >> >> MutationStage                     0         0         187298
>>> >> >> >> ReadRepairStage                   0         0              0
>>> >> >> >> ReplicateOnWriteStage             0         0              0
>>> >> >> >> GossipStage                       0         0        1353524
>>> >> >> >> AntiEntropyStage                  0         0              0
>>> >> >> >> MigrationStage                    0         0             10
>>> >> >> >> MemtablePostFlusher               1       190            108
>>> >> >> >> StreamStage                       0         0              0
>>> >> >> >> FlushWriter                       0         0            302
>>> >> >> >> FILEUTILS-DELETE-POOL             0         0             26
>>> >> >> >> MiscStage                         0         0              0
>>> >> >> >> FlushSorter                       0         0              0
>>> >> >> >> InternalResponseStage             0         0              0
>>> >> >> >> HintedHandoff                     1         4              7
>>> >> >> >>
>>> >> >> >> Anyone with nice theories about the pending value on the
>>> memtable
>>> >> >> >> post
>>> >> >> >> flusher?
>>> >> >> >> Regards,
>>> >> >> >> Terje
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Jonathan Ellis
>>> >> >> > Project Chair, Apache Cassandra
>>> >> >> > co-founder of DataStax, the source for professional Cassandra
>>> support
>>> >> >> > http://www.datastax.com
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Jonathan Ellis
>>> >> >> Project Chair, Apache Cassandra
>>> >> >> co-founder of DataStax, the source for professional Cassandra
>>> support
>>> >> >> http://www.datastax.com
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>

Re: MemtablePostFlusher with high number of pending calls?

Reply via email to