Jon, we had a fair few long pauses. Our test tool gave us latency, and we got a lot of requests taking much longer than they should. Unfortunately we didn't hold onto our logs from the PerformanceEvaluation runs.
Also I would note that PerformanceEvaluation internally disables autoFlush, so it does not run into the issues I have described. I would recommend running some code that has autoWrite set to true to test this problem. We've moved our environment back to 0.20.2 as we start testing before using it in production, so unfortunately we can't run any more tests on it, sorry :/ On Tue, Apr 3, 2012 at 9:21 AM, Jonathan Hsieh <[email protected]> wrote: > Juhani, > > Have you looked at any of the logs from your perf runs? Can you try > running HBase's performance evaluation with debug comments on? I'd like > to know if what I'm seeing is the same as you. > > I've started running some of these and have encountered what seems to be > networking code isssues (SocketTimeoutExceptions, a bunch of delayedAcks in > ganglia, and 4x-5x degradation in write's from 0.90 runs to 0.92 runs). > > == cmd lines: > hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 > hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 1 > > == in log4j.properties > log4j.logger.org.apache.hadoop.hbase=DEBUG > > > Jon. > > > On Thu, Mar 29, 2012 at 12:05 AM, Juhani Connolly <[email protected]> wrote: > >> On Thu, Mar 29, 2012 at 1:10 PM, Stack <[email protected]> wrote: >> > On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly >> > <[email protected]> wrote: >> >> Since we haven't heard anything on expected throughput we're >> downgrading our >> >> hdfs back to 0.20.2, I'd be curious to hear how other people do with >> 0.23 >> >> and the throughput they're getting. >> >> >> > >> > We don't have much experience running on 0.23, I think its fair to >> > say. It works but not much more than that can be said. The sync code >> > path is different in 0.23 than in 0.20.2 and has had less scrutiny >> > (When you say 0.20.2, you mean CDH? Which CDH?). I think its good to >> > go back. >> >> Thanks for the info on 0.23. I suspect that the change in sync you >> mentioned may well have something to do with this, since decreasing >> the frequency of appends through the use of a moderate sized >> writeBuffer at the client end pays huge dividends(as of course does >> removing the appends altogether by disabling wal writes). High counts >> of ungrouped(whether that be by group puts, delayed client flushing or >> delayed WAL flushing) writes seem to suffer pretty badly under 0.23. >> We'll be moving back to 0.20.2 as it seems to be much better tested >> and stressed, likely to the cdh distro(3u3). >> >> > >> > Regards numbers, its hard to compare workloads but if it helps, >> > looking at our frontend now, its relatively idle doing between >> > 100-500k hits on 30 machines that are less than yours, less memory, >> > 10k regions, with a workload that is mostly increments >> > (read-mostly-from-block-cache/modify/write). >> > >> >> Thanks... It's nice to have a frame of reference to compare against. >> >> > Yes, the errors are relatively few but poke around more if you can. >> > Why are there errors at all? >> > St.Ack >> >> I'm not sure. As has been said, likely unrelated, going to try and >> figure it out. >> >> Thanks, >> Juhani >> > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [email protected]
