Duo Zhang created HBASE-14790:
---------------------------------

             Summary: Implement a new DFSOutputStream for logging WAL only
                 Key: HBASE-14790
                 URL: https://issues.apache.org/jira/browse/HBASE-14790
             Project: HBase
          Issue Type: Improvement
            Reporter: Duo Zhang


The original {{DFSOutputStream}} is very powerful and aims to serve all 
purposes. But in fact, we do not need most of the features if we only want to 
log WAL. For example, we do not need pipeline recovery since we could just 
close the old logger and open a new one. And also, we do not need to write 
multiple blocks since we could also open a new logger if the old file is too 
large.

And the most important thing is that, it is hard to handle all the corner cases 
to avoid data loss or data inconsistency(such as HBASE-14004) when using 
original DFSOutputStream due to its complicated logic. And the complicated 
logic also force us to use some magical tricks to increase performance. For 
example, we need to use multiple threads to call {{hflush}} when logging, and 
now we use 5 threads. But why 5 not 10 or 100?

So here, I propose we should implement our own {{DFSOutputStream}} when logging 
WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to