@Ted, Sure, I've opened HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> as an umbrella for all works done on multiwal and please allow me to give a more detailed sharing there, in format of documents rather than emails. :-)
@Jingcheng and @Vlad, any comments/sharing from you will be warmly welcomed in the JIRA. :-) Best Regards, Yu On 21 September 2015 at 19:35, Ted Yu <[email protected]> wrote: > Thanks for sharing, Yu. > > Images didn't go through. Can you use third party site for sharing ? > > Cheers > > > On Sep 20, 2015, at 11:09 PM, Yu Li <[email protected]> wrote: > > > > Hi Vlad, > > > > >> the existing write performance is more than adequate (avg load per RS > usually less than 1MB/sec) > > We have some different user scenarios and I'd like to share with you. We > are using hbase to store data for building search index, and features like > pv/uv of each online item will be recorded, so the write load would reach > as high as 10MB/s (below is a screenshot of the ganglia metrics data) per > RS. OTOH, as a database I think the online write performance of HBase is as > important as read, bulkload is for offline and it cannot resolve all > problem. > > > > > > Another advantage of using multiple wal is that we could do > user/business level isolation on wal. For example you could use one > namespace per business and one wal group per namespace, and you could > replicate only the data for the business in need. > > > > Regarding compaction IO, as I mentioned before, we could use tiered > storage to prevent compaction to affect wal sync. This way we've observed > an obvious improvement on the avg mutate RT, from 0.5ms to 0.3ms on our > online cluster, FYI. > > > > Best Regards, > > Yu > > > >> On 19 September 2015 at 00:55, Vladimir Rodionov < > [email protected]> wrote: > >> Hi, Jingcheng > >> > >> You postpone compaction until your test completes by setting number of > >> blocking stores to 120. That is kind of cheating :) > >> As I said previously, in a long run, compaction rules the world - not > >> number of wal files. In a real production setting, the existing write > >> performance > >> is more than adequate (avg load per RS usually less than 1MB/sec). > Multiwal > >> has probably its value if someone need to load quick large volume of > data, > >> but ... why do not use bulk load instead? > >> > >> Thank for letting us know that beefy servers with 8 SSDs can sustain > such a > >> huge load. > >> > >> -Vlad > >> > >> > >> > >> On Thu, Sep 17, 2015 at 10:30 PM, Jingcheng Du <[email protected]> > >> wrote: > >> > >> > More information for the test. > >> > I use ycsb 0.3.0 for the test. > >> > The command line is "./ycsb load hbase-10 -P ../workloads/workload > -threads > >> > 200 -p columnfamily=family -p clientbuffering=true -s > workload.dat" > >> > The workload is, the data size is slightly less than 1TB: > >> > fieldcount=5 > >> > fieldlength=200 > >> > recordcount=1000000000 > >> > maxexecutiontime=86400 > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > > http://apache-hbase.679495.n3.nabble.com/Multiwal-performance-with-HBase-1-x-tp4074403p4074731.html > >> > Sent from the HBase User mailing list archive at Nabble.com. > >> > > > >
