Re: Cassandra vs HBase

2009-12-05 Thread Matt Revelle
Cassandra performance likely still beats HBase, but according to the Powered 
By page on the HBase wiki it is being used to handle realtime requests by 
StumbleUpon, Meetup, and Streamy 
(http://wiki.apache.org/hadoop/Hbase/PoweredBy).

These two documents contain some performance numbers:
http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf  (skip to page 22)
http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation

Both Cassandra and HBase are useful tech, I just wanted to point out that HBase 
performance has improved over the past year and it can handle realtime requests.

On Dec 5, 2009, at 11:08 PM, Tim Estes wrote:

 Can you link/reference those? I haven't seen random read or write performance 
 numbers published around V0.20 Hbase that are within 5x of Cassandra. I'm 
 very curious about this...
 
 Sent from my iPhone
 
 On Dec 5, 2009, at 11:05 PM, Matt Revelle mreve...@gmail.com wrote:
 
 On Dec 5, 2009, at 21:45, Joe Stump j...@joestump.net wrote:
 
 
 On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
 
 [Is] HBase used for real timish applications and if so any ideas what the 
 largest deployment is.
 
 I don't know of anyone off the top of my head who's using anything built on 
 top of Hadoop for a real-time environment. Hadoop just wasn't built for 
 that. It was built, like MapReduce, for crunching absurd amounts of data 
 across hundreds of nodes in a reasonable amount of time.
 
 Just my $0.02.
 
 --Joe
 
 
 While Hadoop MapReduce isn't meant for realtime use, HBase can handle it.
 
 Over last summer there were some benchmarks included in HBase/Hadoop 
 presentations that showed, IIRC, performance comparable to Cassandra.
 



Re: Cassandra users survey

2009-11-23 Thread Matt Revelle

On Nov 23, 2009, at 12:27, Ted Zlatanov t...@lifelogs.com wrote:

On Fri, 20 Nov 2009 17:38:39 -0800 Dan Di Spaltro dan.dispal...@gmail.com 
 wrote:


DDS At Cloudkick we are using Cassandra to store monitoring  
statistics and
DDS running analytics over the data.  I would love to share some  
ideas
DDS about how we set up our data-model, if anyone is interested.   
This
DDS isn't the right thread to do it in, but I think it would be  
useful to
DDS show how we store billions of points of data in Cassandra (and  
maybe

DDS get some feedback).

I'd like to see that.  My Cassandra use is also for monitoring and so
far it has been great.  I store status updates in a SuperColumn  
indexed

by date and each row represents a unique resource.  It's really simple
compared to your setup, I'm sure.

Ted


Hi Dan and Ted,

Are you both using timestamps as row keys?  Would be great to hear  
more details.


-Matt