Summary
=======

To help improve performance of both the technology and our team, we are 
evaluating adoption of a new storage subsystem.  A number of solutions have 
been evaluated and we have come to some initial conclusions:
Eliminated from consideration
Infinispan
Neo4J
SenseiDB
Voldemort
Still under investigation
Cassandra
MongoDB
Relational DB / JPA / JDO.

We want to emphasize that any technology we adopt will be a transition over 
time, so that we can maintain stability in the application.

Background
=========

Our Ann Arbor meeting last month was about thinking through architectural 
changes that could improve OAE in terms of system performance and team 
performance, as well as laying the groundwork for taking measurements (modeling 
a production-like data set, provisioning load testing infrastructure, and the 
load tests themselves). For team performance, we agreed that we should strive 
to rely more heavily on established low-level infrastructure for storage. In 
other words, find a storage subsystem and API that we don't need to maintain 
ourselves. This is a fertile time for storage technology, but that also means 
there are many options to sift through, each with its own quirks and tradeoffs.

We set the following criteria for our search (in no particular order):

* ease of use for developers (APIs, etc.)
* ease of use for deployers (backups, failover, monitoring, etc.)
* strength of the community
* suitable license (ECL2 compatible)
* proven track record (success stories in applications somewhat like ours)
* options for queries
* options for scaling
* options for integrity (atomcity, consistency, transactions, referential 
integrity)

In the weeks since our meeting, the server devs have explored various options. 
I'd like to summarize our progress so far. In the interests of brevity, we 
won't include every detail. We invite your questions and feedback. Note that 
we're right in the middle of our investigation, not at the end. Hopefully, 
we'll get some more time with OmniTI to talk about what we've learned.

Infinispan - Infinispan is a successor to JBoss Cache. It is simply a caching 
layer that has the ability to persist via a configurable cache-store. 
Strengths: high level configurability for things like transactions, 
write-behind/write-through persistence, high volume of community activity, and 
storage agnosticism. Weaknesses: Work with Infinispan showed promise with 
respect to operating inside an OSGi container, however when trying to persist 
POJO data, it became obvious that another library such as Hibernate OGM [6] 
would be necessary to make persistence of POJOs through Infinispan possible, 
and there just doesn't seem to be anything mature enough, or well documented 
enough that we could start building off of. Current Thinking: Infinispan, while 
being a pretty mature memory grid and caching layer, seems a little premature 
to start thinking of as a full-fledged domain persistence layer.

Voldemort -- a project that comes out of LinkedIn. It is a distributed 
key-value store with sophisticated horizontal scaling using a ring topology 
similar to Cassandra's. Strengths: speed, elastic scaling, runs in the JVM, 
supports various forms of serialization, including JSON and Google's protobuf. 
Weaknesses: no support for querying, so we'd have to write separate synchronous 
indexing using Lucene, and all the glue code to make them work together. Not an 
active, diverse community. Current Thinking: too much work to get basic 
store-and-find operations.

MongoDB -- a document-oriented database with a very developer-friendly API. 
Managed and backed commercially by 10gen. Strengths: really easy to use. You 
can store JSON documents, which is just what we want to do, and you can create 
indexes and query like we're used to from the relational DB world. Huge 
community, probably the most active in the NoSQL space. Tools, hosting options, 
the works. Weaknesses: We've seen the same story a number of times 
[1][2][3][4]: everyone loves MongoDB at first, but it becomes operationally 
painful in production. Scaling it is complex. Current Thinking: The pain might 
be worth it, but it certainly gives us pause. This is probably a product that 
is going to be much easier to manage when it matures.

Cassandra -- a column family database, borrowing ideas from Google BigTable and 
Amazon DynamoDB. Originated at Facebook, but since it moved to the Apache 
Foundation, it has taken on a life of its own. Commercially backed by DataStax. 
Strengths: very good replication and scaling technology (a ring topology, like 
Voldemort). Supports queries, but you have to plan for them in your data model. 
Very strong community. Consistency tunable per-request. Weaknesses: steeper 
learning curve for devs. Data modeling in Cassandra is a different paradigm 
from the relational databases we know. Current Thinking: this is attractive for 
its power, but it will take work to get everybody up to speed on it. In a 
sense, it's the opposite of the MongoDB story: harder to get started with, but 
very satisfying in the long term. See [1] for more on this.

JPA/JDO with a relational database -- This is the technology we're familiar 
with from projects past. This is the tried and true relational model with 
tables, ORM, and sometimes SQL. Strengths: Everyone knows how this works. There 
are plenty of tools, plenty of commercial support, and you can write JOINs! 
Weaknesses: vertical scaling. When you reach the limit of the hardware you can 
throw at your database server, you can try sharding, which is notoriously 
difficult, or something like memcached, but then you're committing to key-value 
semantics, so why not just go there in the first place? [5] JPA via OpenJPA and 
Eclipselink has been surprisingly hard to get working in an OSGi runtime. They 
have trouble with the dynamic nature of bundles. Exploring JDO at the moment, 
but it too feels like swimming upstream. Current Thinking: this is familiar, 
but newer technologies have shown us that one size no longer fits all.

[1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
[2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql
[3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
[4] 
http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/
[5] http://www.couchbase.com/
[6] http://www.hibernate.org/subprojects/ogm.html

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to