Juhani, Have you looked at any of the logs from your perf runs? Can you try running HBase's performance evaluation with debug comments on? I'd like to know if what I'm seeing is the same as you.
I've started running some of these and have encountered what seems to be networking code isssues (SocketTimeoutExceptions, a bunch of delayedAcks in ganglia, and 4x-5x degradation in write's from 0.90 runs to 0.92 runs). == cmd lines: hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 1 == in log4j.properties log4j.logger.org.apache.hadoop.hbase=DEBUG Jon. On Thu, Mar 29, 2012 at 12:05 AM, Juhani Connolly <[email protected]> wrote: > On Thu, Mar 29, 2012 at 1:10 PM, Stack <[email protected]> wrote: > > On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly > > <[email protected]> wrote: > >> Since we haven't heard anything on expected throughput we're > downgrading our > >> hdfs back to 0.20.2, I'd be curious to hear how other people do with > 0.23 > >> and the throughput they're getting. > >> > > > > We don't have much experience running on 0.23, I think its fair to > > say. It works but not much more than that can be said. The sync code > > path is different in 0.23 than in 0.20.2 and has had less scrutiny > > (When you say 0.20.2, you mean CDH? Which CDH?). I think its good to > > go back. > > Thanks for the info on 0.23. I suspect that the change in sync you > mentioned may well have something to do with this, since decreasing > the frequency of appends through the use of a moderate sized > writeBuffer at the client end pays huge dividends(as of course does > removing the appends altogether by disabling wal writes). High counts > of ungrouped(whether that be by group puts, delayed client flushing or > delayed WAL flushing) writes seem to suffer pretty badly under 0.23. > We'll be moving back to 0.20.2 as it seems to be much better tested > and stressed, likely to the cdh distro(3u3). > > > > > Regards numbers, its hard to compare workloads but if it helps, > > looking at our frontend now, its relatively idle doing between > > 100-500k hits on 30 machines that are less than yours, less memory, > > 10k regions, with a workload that is mostly increments > > (read-mostly-from-block-cache/modify/write). > > > > Thanks... It's nice to have a frame of reference to compare against. > > > Yes, the errors are relatively few but poke around more if you can. > > Why are there errors at all? > > St.Ack > > I'm not sure. As has been said, likely unrelated, going to try and > figure it out. > > Thanks, > Juhani > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
