Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Gaurav Sharma
Folks, my apologies if this has been discussed here before but can someone please shed some light on how Hypertable is claiming upto a 900% higher throughput on random reads and upto a 1000% on sequential reads in their performance evaluation vs HBase (modeled after the perf-eval test in section 7

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
So if that is the case, I'm not sure how that is a fair test. One system reads from RAM, the other from disk. The results as expected. Why not test one system with SSDs and the other without? It's really hard to get apples/oranges comparison. Even if you are doing the same workloads on 2

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
Purtell has more, but he told me no longer crashes, but minor pauses between 50-250 ms. From 1.6_23. Still not usable in a latency sensitive prod setting. Maybe in other settings? -ryan On Wed, Dec 15, 2010 at 11:31 AM, Ted Dunning tdunn...@maprtech.com wrote: Does anybody have a recent

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Gaurav Sharma
Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have given them a further advantage but as you said, not much is known about the test source code. On Wed, Dec 15, 2010 at 2:22 PM, Ryan Rawson ryano...@gmail.com wrote: So if that is the case, I'm not sure how that is a

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Todd Lipcon
On Wed, Dec 15, 2010 at 11:44 AM, Gaurav Sharma gaurav.gs.sha...@gmail.com wrote: Thanks Ryan and Ted. I also think if they were using tcmalloc, it would have given them a further advantage but as you said, not much is known about the test source code. I think Hypertable does use tcmalloc or

RE: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Chad Walters
Why not run multiple JVMs per machine? Chad -Original Message- From: Ryan Rawson [mailto:ryano...@gmail.com] Sent: Wednesday, December 15, 2010 11:52 AM To: dev@hbase.apache.org Subject: Re: Hypertable claiming upto 900% random-read throughput vs HBase The malloc thing was pointing out

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Ryan Rawson
Why do that? You reduce the cache effectiveness and up the logistical complexity. As a stopgap maybe, but not as a long term strategy. Sun just needs to fix their GC. Er, Oracle. -ryan On Wed, Dec 15, 2010 at 11:55 AM, Chad Walters chad.walt...@microsoft.com wrote: Why not run multiple JVMs

RE: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Chad Walters
Sure, but if the tradeoff is being unable to use all the memory effectively and suffering 10x unfavorable benchmark comparisons, then running 2 or more JVMs with a regionserver per VM seems like a reasonable stopgap until the GC works better. Chad -Original Message- From: Ryan Rawson

RE: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Vladimir Rodionov
Why do not you use off heap memory for this purpose? If its block cache (all blocks are of equal sizes) alloc/free algorithm is pretty much simple - you do not have to re-implement malloc in Java. I think something like open source version of Terracotta BigMemory is a good candidate for Apache

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Ted Dunning
That isn't really the trade-off. The 10x is on an undocumented benchmark with apples to oranges tuning. Moreover, hbase has had massive speedups since then. Being able to set heap size actually lets me control memory use more precisely and running a single JVM lets me amortize JVM cost. Java

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Andrew Purtell
Does anybody have a recent report about how G1 is coming along? Not in general, but as pertains to HBase, tried it recently with 1.6.0u23 and ran a generic heavy write test without crashing any more, so that is something. But I have not tried stressing it at production workloads. Best

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Andrew Purtell
From: Ryan Rawson ryano...@gmail.com Purtell has more, but he told me no longer crashes, but minor pauses between 50-250 ms. From 1.6_23. That's right. On EC2 m1.xlarge so that's a big caveat... per-test-iteration variance on EC2 in general is ~20%, and EC2 hardware is 2? generations

Build failed in Hudson: hbase-0.90 #30

2010-12-15 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/hbase-0.90/30/changes Changes: [jdcryans] HBASE-3360 ReplicationLogCleaner is enabled by default in 0.90 -- causes NPE -- [...truncated 2734 lines...] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time

Review Request: hbase-3362 If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible

2010-12-15 Thread stack
--- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1298/ --- Review request for hbase and Jonathan Gray. Summary --- M

Re: Hypertable claiming upto 900% random-read throughput vs HBase

2010-12-15 Thread Ed Kohlwey
Along the lines of Terracotta big memory, apparently what they are actually doing is just using the DirectByteBuffer class (see this forum post: http://forums.terracotta.org/forums/posts/list/4304.page) which is basically the same as using malloc - it gives you non-gc access to a giant pool of

Hudson build is back to normal : hbase-0.90 #32

2010-12-15 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/hbase-0.90/32/changes

Build failed in Hudson: hbase-0.90 #31

2010-12-15 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/hbase-0.90/31/changes Changes: [stack] HBASE-3365 EOFE contacting crashed RS causes Master abort [jdcryans] HBASE-3363 ReplicationSink should batch delete doc fixes for replication -- [...truncated 2739