Akashnil created HBASE-6361:
-------------------------------

             Summary: Change the compaction queue to a round robin scheduler
                 Key: HBASE-6361
                 URL: https://issues.apache.org/jira/browse/HBASE-6361
             Project: HBase
          Issue Type: Improvement
            Reporter: Akashnil


Currently the compaction requests are submitted to the minor/major compaction 
queue of a region-server from every column-family/region belonging to it. The 
queue processes those requests in FIFO order (First in First out). We want to 
make a lazy scheduler in place of the current one. The idea of lazy scheduling 
is that, it is always better to make a decision (compaction selection) later if 
the decision is relevant later only. Currently, if the queue grows large, 
currently generated requests are not processed until all the preceding requests 
are executed. Rather than that, we can postpone the compaction selection until 
the queue is empty when we will have more information (new flush files will 
have affected the state) to make a better decision.

Removing the queue, we propose to implement a round-robin scheduler. All the 
column families in their regions will be visited in sequence periodically. In 
each visit, if the column family generates a valid compaction request, the 
request is executed before moving to the next one. We do not plan to change the 
current compaction algorithm for now. We expect that it will automatically make 
a better decision when doing just-in-time selection due to the new change. How 
do we know that? Let us consider an example.

Note that the presently existing compaction queue is only relevant as a buffer, 
when the flushes out-pace the compactions for a period of time, or a relatively 
large compaction consumes time to complete, the queue accumulates requests. 
Suppose such a scenario has occurred. Suppose min-files for compaction = 4. For 
an active column-family, new compaction requests, each of size 4 will be added 
to the queue continuously until the queue starts processing them.

Now consider a round-robin scheduler. The effect of a bottle-neck due to the IO 
rate of compaction results in a longer latency to visit the same column family 
again. By this time suppose there are 16 new flush files in this column family. 
The compaction selection algorithm will select a compaction request of size 16, 
as opposed to 4 compaction requests of size 4 that would have been generated in 
the previous case.

A compaction request with 16 flush files is more IOPs-efficient than the same 
set of files being compacted 4 at a time. This is because both consume the same 
total amount of reads, total writes, and IOPs/sec while producing a file of 
size 16 compared to 4 files of size 4. So we obtained a free compaction from 
those 4*4->16 without paying for it. In case of the queue, those smaller files 
would have consumed more IOPs to become bigger later.

In case of uniform steady-state load this change should not make a difference, 
because the compaction queue would have been empty anyway. However in case of 
bursty load, it automatically adapts itself to consume less IOPs in times of 
high flush rate. This negative feedback should mainly improve 
faliure-resistence of the system. In case something goes wrong, monitoring 
should still give feedback, not in the form of queue size, but the number of 
files in each compaction, which will go up when the bottle-neck occurs. If 
there is no important down-sides, this should be a very good change since this 
should apply to all use-cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to