Re: [oae-dev] OAE Storage Investigation Update

Nicolaas Matthijs Fri, 29 Jun 2012 02:29:10 -0700

Thanks for this update, Zach.

I think it would be really useful for ops teams to jump in here aswell, and provide any feedback they can about the options that havebeen laid out.


Thanks in advance,
Nicolaas



On 28 Jun 2012, at 20:50, Zach A. Thomas wrote:

Summary
=======
To help improve performance of both the technology and our team, weare evaluating adoption of a new storage subsystem. A number ofsolutions have been evaluated and we have come to some initialconclusions:
Eliminated from consideration
Infinispan
Neo4J
SenseiDB
Voldemort
Still under investigation
Cassandra
MongoDB
Relational DB / JPA / JDO.
We want to emphasize that any technology we adopt will be atransition over time, so that we can maintain stability in theapplication.
Background
=========
Our Ann Arbor meeting last month was about thinking througharchitectural changes that could improve OAE in terms of systemperformance and team performance, as well as laying the groundworkfor taking measurements (modeling a production-like data set,provisioning load testing infrastructure, and the load teststhemselves). For team performance, we agreed that we should striveto rely more heavily on established low-level infrastructure forstorage. In other words, find a storage subsystem and API that wedon't need to maintain ourselves. This is a fertile time for storagetechnology, but that also means there are many options to siftthrough, each with its own quirks and tradeoffs.
We set the following criteria for our search (in no particular order):

* ease of use for developers (APIs, etc.)
* ease of use for deployers (backups, failover, monitoring, etc.)
* strength of the community
* suitable license (ECL2 compatible)
* proven track record (success stories in applications somewhat likeours)
* options for queries
* options for scaling
* options for integrity (atomcity, consistency, transactions,referential integrity)
In the weeks since our meeting, the server devs have exploredvarious options. I'd like to summarize our progress so far. In theinterests of brevity, we won't include every detail. We invite yourquestions and feedback. Note that we're right in the middle of ourinvestigation, not at the end. Hopefully, we'll get some more timewith OmniTI to talk about what we've learned.
Infinispan - Infinispan is a successor to JBoss Cache. It is simplya caching layer that has the ability to persist via a configurablecache-store. Strengths: high level configurability for things liketransactions, write-behind/write-through persistence, high volume ofcommunity activity, and storage agnosticism. Weaknesses: Work withInfinispan showed promise with respect to operating inside an OSGicontainer, however when trying to persist POJO data, it becameobvious that another library such as Hibernate OGM [6] would benecessary to make persistence of POJOs through Infinispan possible,and there just doesn't seem to be anything mature enough, or welldocumented enough that we could start building off of. CurrentThinking: Infinispan, while being a pretty mature memory grid andcaching layer, seems a little premature to start thinking of as afull-fledged domain persistence layer.
Voldemort -- a project that comes out of LinkedIn. It is adistributed key-value store with sophisticated horizontal scalingusing a ring topology similar to Cassandra's. Strengths: speed,elastic scaling, runs in the JVM, supports various forms ofserialization, including JSON and Google's protobuf. Weaknesses: nosupport for querying, so we'd have to write separate synchronousindexing using Lucene, and all the glue code to make them worktogether. Not an active, diverse community. Current Thinking: toomuch work to get basic store-and-find operations.
MongoDB -- a document-oriented database with a very developer-friendly API. Managed and backed commercially by 10gen. Strengths:really easy to use. You can store JSON documents, which is just whatwe want to do, and you can create indexes and query like we're usedto from the relational DB world. Huge community, probably the mostactive in the NoSQL space. Tools, hosting options, the works.Weaknesses: We've seen the same story a number of times [1][2][3][4]: everyone loves MongoDB at first, but it becomes operationallypainful in production. Scaling it is complex. Current Thinking: Thepain might be worth it, but it certainly gives us pause. This isprobably a product that is going to be much easier to manage when itmatures.
Cassandra -- a column family database, borrowing ideas from GoogleBigTable and Amazon DynamoDB. Originated at Facebook, but since itmoved to the Apache Foundation, it has taken on a life of its own.Commercially backed by DataStax. Strengths: very good replicationand scaling technology (a ring topology, like Voldemort). Supportsqueries, but you have to plan for them in your data model. Verystrong community. Consistency tunable per-request. Weaknesses:steeper learning curve for devs. Data modeling in Cassandra is adifferent paradigm from the relational databases we know. CurrentThinking: this is attractive for its power, but it will take work toget everybody up to speed on it. In a sense, it's the opposite ofthe MongoDB story: harder to get started with, but very satisfyingin the long term. See [1] for more on this.
JPA/JDO with a relational database -- This is the technology we'refamiliar with from projects past. This is the tried and truerelational model with tables, ORM, and sometimes SQL. Strengths:Everyone knows how this works. There are plenty of tools, plenty ofcommercial support, and you can write JOINs! Weaknesses: verticalscaling. When you reach the limit of the hardware you can throw atyour database server, you can try sharding, which is notoriouslydifficult, or something like memcached, but then you're committingto key-value semantics, so why not just go there in the first place?[5] JPA via OpenJPA and Eclipselink has been surprisingly hard toget working in an OSGi runtime. They have trouble with the dynamicnature of bundles. Exploring JDO at the moment, but it too feelslike swimming upstream. Current Thinking: this is familiar, butnewer technologies have shown us that one size no longer fits all.
[1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
[2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql
[3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
[4] 
http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/
[5] http://www.couchbase.com/
[6] http://www.hibernate.org/subprojects/ogm.html

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Re: [oae-dev] OAE Storage Investigation Update

Reply via email to