Benjamin,
Please see below - it sounds like you're taking this a little personally
and I'm not sure why. You've made some errors in your reply.
Colin
+1 315 886 3422 cella
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>
On 7/10/2010 5:21 PM, Benjamin Black wrote:
On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark
<co...@cloudeventprocessing.com> wrote:
Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
1 deployments, because I don't have the resources of Facebook, and even
though Cassandra is open source, that doesn't mean I can fix it when it goes
down. And, because it's open source, there's no one to call to have it
fixed reliably and within production constraints. Cassandra's strength is
its greatest weakness right now.
There are others, however, who do have the skills not just to fix it
when it goes down, but to improve the code in a variety of ways and
contribute that code back the the project. That you do not have those
skills is a good indication you should stick to what you know, not an
indictment of Cassandra (or any other non-SQL store).
I didn't say 'didn't have the skills.' I said 'resources.' Those are
two very different things. While I and my team have nothing to prove to
you, working on Cassandra is completely within our realm of ability and
expertise. Not having the resources means, that relative to our current
focus, we, our customers, and our investors get a bigger bang of each
engineering $ spent having us focus on different problems. Using a
piece of software isn't just an engineering issue, it has to make
business sense as well. So if I really wanted to use Cassandra in a
mission critical way, I'd have to be able to justify the investment
involved in creating an internal Cassandra team. This is why there's so
much 'flap' over what Twitter and Facebook are or are not using
Cassandra for.
The bloom is starting to come off NoSQL, which is normal - it means that
people& firms are trying to do more with it and most probably realizing
that all of the tools, support, infrastructure, etc. surrounding alternative
solutions isn't such a bad thing. And that the world of NoSQL had start to
come up with a better mantra than "joins are bad, dude", and "you're just
protecting the status quo." There's a *lot more* big data wrapped up inside
of SQL databases and only a fraction of the in NoSQL - and there's a lot of
reasons for it.
You are, for whatever reason, using the dullest of cliches as if they
were informed opinion. Nobody with actual knowledge of the space says
"joins are bad, dude". What they might say is "When you have
petabytes and low latency requirements, joins are an expensive
proposition". That is clearly a true statement and constructing
indices in a column store to avoid joins is a reasonable decision to
avoid that expense. Is it free? Of course not, nothing is.
Again, I'm a fan of NoSql, and of Cassandra. When I said, 'the world of
NoSQL,' I was including myself in that world. And, I agree that those
cliches are dull, overused, and ill-informed (anyone who's actually done
anything with a lot of data knows how expensive joins are - with or
without petabytes). But again, this is what business sees when they
listen to Twitter, or subscribe to these mailing lists. This is how
opinions are formed in the minds of analysts and they then influence
their customers. We need to do a better job, and yet again, this is why
understanding what Twitter and Facebook are or are not doing with
Cassandra is important.
For example, do I *really* need Cassandra if MySQL will work for me and I
just want to get up and running quickly without writing a bunch of code? My
team was pushing greater than 20k updates per second into, GASP, Oracle 5
years ago. Sure, it was expensive. But it worked. And it was worth it -
or we wouldn't have spent the $$. What's your data worth if you don't have
your data? zero.
Had you spent any time on the irc channel you would've seen this
advice given repeatedly. If you don't need what Cassandra does, don't
use it. That you have seen 20k updates/sec on really expensive
hardware with a SQL store is neither surprising nor relevant. As you
must realize, those choose to ignore, Cassandra is about more than
just high, per-node write throughput. It is about seamless scale-out
of a single cluster, robustness in the face of node failure and
network partition, etc. Can you do that with a SQL store? Certainly.
Expect to pay 5x in hardware and not be able to operate multi-DC.
It's what folks call a trade-off.
So that's a trade-off? Thanks - maybe Facebook and Twitter missed that
before spending hundreds of thousands of $$ on a project only to later
change course. Include opportunity cost in that, and you're easily in
the millions of wasted $- or do we call that a 'learning exercise?' I'd
love to hear what Twitter & Facebook's boards (there I am again with
that whole pesky 'business' thing again) had to say about that? And I'm
assuming that the same thing might just happen to a tech team that chose
to spend valuable cycles on evaluating/implementing Cassandra only to
change course - they'd have to explain that as well. And then they'd
hear something like, "Dudes, you did what? Even Facebook & Twitter
decided not to use Cassandra that way!" This is not as far fetched as
it sounds. Someone on my advisory board asked me a very similar
question about our use of Cassandra and given the recent news, whether
or not that impacted our plans.
And I'm assuming that if you're going to frantically wave arms with "SQL
costs 5x more and you can't do that multi-DC..." that you've got
something to back that up? 'Cuz Facebook is using a SQL store, they're
using it multi-DC, and they're running on commodity hardware, right?
And then there's support - internal support. Picking a database du-jour is
organizationally expensive. Especially when there's probably one or two
databases that Twitter could have bought off the shelf that would have
solved their problems.
You have no idea what their actual problems are and are merely
engaging in the favorite game of HN and similar venues: armchair
engineering.
Sure I do. But from a business perspective. Their architecture doesn't
scale right now very well. They're running with reduced API limits and
you still get the 'fail whale' more than occasionally. People lose
followers. People lose tweets. Privacy has been compromised. Need I
go on? All of this would make me, as a potential customer of Twitter,
as a question, "So, what's up with the scalability thing? What happens
if I miss a critical time window with my sponsored Tweets? Do I get
that $ back?, I didn't get 'imprints' but the opportunity is gone." But
you're right, from an engineering point of view, I have no idea what
their problems are. I do know that Cassandra was supposed to fix some
of them, and now it's not and I don't know anything about that from an
engineering point of view either.
Also, I have no idea of what 'HN or similar venues' refers to.
b