Thanks for this update, Zach.
I think it would be really useful for ops teams to jump in here as
well, and provide any feedback they can about the options that have
been laid out.
Thanks in advance,
Nicolaas
On 28 Jun 2012, at 20:50, Zach A. Thomas wrote:
Summary
=======
To help improve performance of both the technology and our team, we
are evaluating adoption of a new storage subsystem. A number of
solutions have been evaluated and we have come to some initial
conclusions:
Eliminated from consideration
Infinispan
Neo4J
SenseiDB
Voldemort
Still under investigation
Cassandra
MongoDB
Relational DB / JPA / JDO.
We want to emphasize that any technology we adopt will be a
transition over time, so that we can maintain stability in the
application.
Background
=========
Our Ann Arbor meeting last month was about thinking through
architectural changes that could improve OAE in terms of system
performance and team performance, as well as laying the groundwork
for taking measurements (modeling a production-like data set,
provisioning load testing infrastructure, and the load tests
themselves). For team performance, we agreed that we should strive
to rely more heavily on established low-level infrastructure for
storage. In other words, find a storage subsystem and API that we
don't need to maintain ourselves. This is a fertile time for storage
technology, but that also means there are many options to sift
through, each with its own quirks and tradeoffs.
We set the following criteria for our search (in no particular order):
* ease of use for developers (APIs, etc.)
* ease of use for deployers (backups, failover, monitoring, etc.)
* strength of the community
* suitable license (ECL2 compatible)
* proven track record (success stories in applications somewhat like
ours)
* options for queries
* options for scaling
* options for integrity (atomcity, consistency, transactions,
referential integrity)
In the weeks since our meeting, the server devs have explored
various options. I'd like to summarize our progress so far. In the
interests of brevity, we won't include every detail. We invite your
questions and feedback. Note that we're right in the middle of our
investigation, not at the end. Hopefully, we'll get some more time
with OmniTI to talk about what we've learned.
Infinispan - Infinispan is a successor to JBoss Cache. It is simply
a caching layer that has the ability to persist via a configurable
cache-store. Strengths: high level configurability for things like
transactions, write-behind/write-through persistence, high volume of
community activity, and storage agnosticism. Weaknesses: Work with
Infinispan showed promise with respect to operating inside an OSGi
container, however when trying to persist POJO data, it became
obvious that another library such as Hibernate OGM [6] would be
necessary to make persistence of POJOs through Infinispan possible,
and there just doesn't seem to be anything mature enough, or well
documented enough that we could start building off of. Current
Thinking: Infinispan, while being a pretty mature memory grid and
caching layer, seems a little premature to start thinking of as a
full-fledged domain persistence layer.
Voldemort -- a project that comes out of LinkedIn. It is a
distributed key-value store with sophisticated horizontal scaling
using a ring topology similar to Cassandra's. Strengths: speed,
elastic scaling, runs in the JVM, supports various forms of
serialization, including JSON and Google's protobuf. Weaknesses: no
support for querying, so we'd have to write separate synchronous
indexing using Lucene, and all the glue code to make them work
together. Not an active, diverse community. Current Thinking: too
much work to get basic store-and-find operations.
MongoDB -- a document-oriented database with a very developer-
friendly API. Managed and backed commercially by 10gen. Strengths:
really easy to use. You can store JSON documents, which is just what
we want to do, and you can create indexes and query like we're used
to from the relational DB world. Huge community, probably the most
active in the NoSQL space. Tools, hosting options, the works.
Weaknesses: We've seen the same story a number of times [1][2][3]
[4]: everyone loves MongoDB at first, but it becomes operationally
painful in production. Scaling it is complex. Current Thinking: The
pain might be worth it, but it certainly gives us pause. This is
probably a product that is going to be much easier to manage when it
matures.
Cassandra -- a column family database, borrowing ideas from Google
BigTable and Amazon DynamoDB. Originated at Facebook, but since it
moved to the Apache Foundation, it has taken on a life of its own.
Commercially backed by DataStax. Strengths: very good replication
and scaling technology (a ring topology, like Voldemort). Supports
queries, but you have to plan for them in your data model. Very
strong community. Consistency tunable per-request. Weaknesses:
steeper learning curve for devs. Data modeling in Cassandra is a
different paradigm from the relational databases we know. Current
Thinking: this is attractive for its power, but it will take work to
get everybody up to speed on it. In a sense, it's the opposite of
the MongoDB story: harder to get started with, but very satisfying
in the long term. See [1] for more on this.
JPA/JDO with a relational database -- This is the technology we're
familiar with from projects past. This is the tried and true
relational model with tables, ORM, and sometimes SQL. Strengths:
Everyone knows how this works. There are plenty of tools, plenty of
commercial support, and you can write JOINs! Weaknesses: vertical
scaling. When you reach the limit of the hardware you can throw at
your database server, you can try sharding, which is notoriously
difficult, or something like memcached, but then you're committing
to key-value semantics, so why not just go there in the first place?
[5] JPA via OpenJPA and Eclipselink has been surprisingly hard to
get working in an OSGi runtime. They have trouble with the dynamic
nature of bundles. Exploring JDO at the moment, but it too feels
like swimming upstream. Current Thinking: this is familiar, but
newer technologies have shown us that one size no longer fits all.
[1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
[2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql
[3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
[4]
http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/
[5] http://www.couchbase.com/
[6] http://www.hibernate.org/subprojects/ogm.html
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev