With such an important decision ahead of us, could someone please articulate how the decision will be made? And in what time frame?
Thanks, L On Jun 28, 2012, at 3:50 PM, Zach A. Thomas <[email protected]> wrote: > Summary > ======= > > To help improve performance of both the technology and our team, we are > evaluating adoption of a new storage subsystem. A number of solutions have > been evaluated and we have come to some initial conclusions: > Eliminated from consideration > Infinispan > Neo4J > SenseiDB > Voldemort > Still under investigation > Cassandra > MongoDB > Relational DB / JPA / JDO. > > We want to emphasize that any technology we adopt will be a transition over > time, so that we can maintain stability in the application. > > Background > ========= > > Our Ann Arbor meeting last month was about thinking through architectural > changes that could improve OAE in terms of system performance and team > performance, as well as laying the groundwork for taking measurements > (modeling a production-like data set, provisioning load testing > infrastructure, and the load tests themselves). For team performance, we > agreed that we should strive to rely more heavily on established low-level > infrastructure for storage. In other words, find a storage subsystem and API > that we don't need to maintain ourselves. This is a fertile time for storage > technology, but that also means there are many options to sift through, each > with its own quirks and tradeoffs. > > We set the following criteria for our search (in no particular order): > > * ease of use for developers (APIs, etc.) > * ease of use for deployers (backups, failover, monitoring, etc.) > * strength of the community > * suitable license (ECL2 compatible) > * proven track record (success stories in applications somewhat like ours) > * options for queries > * options for scaling > * options for integrity (atomcity, consistency, transactions, referential > integrity) > > In the weeks since our meeting, the server devs have explored various > options. I'd like to summarize our progress so far. In the interests of > brevity, we won't include every detail. We invite your questions and > feedback. Note that we're right in the middle of our investigation, not at > the end. Hopefully, we'll get some more time with OmniTI to talk about what > we've learned. > > Infinispan - Infinispan is a successor to JBoss Cache. It is simply a caching > layer that has the ability to persist via a configurable cache-store. > Strengths: high level configurability for things like transactions, > write-behind/write-through persistence, high volume of community activity, > and storage agnosticism. Weaknesses: Work with Infinispan showed promise with > respect to operating inside an OSGi container, however when trying to persist > POJO data, it became obvious that another library such as Hibernate OGM [6] > would be necessary to make persistence of POJOs through Infinispan possible, > and there just doesn't seem to be anything mature enough, or well documented > enough that we could start building off of. Current Thinking: Infinispan, > while being a pretty mature memory grid and caching layer, seems a little > premature to start thinking of as a full-fledged domain persistence layer. > > Voldemort -- a project that comes out of LinkedIn. It is a distributed > key-value store with sophisticated horizontal scaling using a ring topology > similar to Cassandra's. Strengths: speed, elastic scaling, runs in the JVM, > supports various forms of serialization, including JSON and Google's > protobuf. Weaknesses: no support for querying, so we'd have to write separate > synchronous indexing using Lucene, and all the glue code to make them work > together. Not an active, diverse community. Current Thinking: too much work > to get basic store-and-find operations. > > MongoDB -- a document-oriented database with a very developer-friendly API. > Managed and backed commercially by 10gen. Strengths: really easy to use. You > can store JSON documents, which is just what we want to do, and you can > create indexes and query like we're used to from the relational DB world. > Huge community, probably the most active in the NoSQL space. Tools, hosting > options, the works. Weaknesses: We've seen the same story a number of times > [1][2][3][4]: everyone loves MongoDB at first, but it becomes operationally > painful in production. Scaling it is complex. Current Thinking: The pain > might be worth it, but it certainly gives us pause. This is probably a > product that is going to be much easier to manage when it matures. > > Cassandra -- a column family database, borrowing ideas from Google BigTable > and Amazon DynamoDB. Originated at Facebook, but since it moved to the Apache > Foundation, it has taken on a life of its own. Commercially backed by > DataStax. Strengths: very good replication and scaling technology (a ring > topology, like Voldemort). Supports queries, but you have to plan for them in > your data model. Very strong community. Consistency tunable per-request. > Weaknesses: steeper learning curve for devs. Data modeling in Cassandra is a > different paradigm from the relational databases we know. Current Thinking: > this is attractive for its power, but it will take work to get everybody up > to speed on it. In a sense, it's the opposite of the MongoDB story: harder to > get started with, but very satisfying in the long term. See [1] for more on > this. > > JPA/JDO with a relational database -- This is the technology we're familiar > with from projects past. This is the tried and true relational model with > tables, ORM, and sometimes SQL. Strengths: Everyone knows how this works. > There are plenty of tools, plenty of commercial support, and you can write > JOINs! Weaknesses: vertical scaling. When you reach the limit of the hardware > you can throw at your database server, you can try sharding, which is > notoriously difficult, or something like memcached, but then you're > committing to key-value semantics, so why not just go there in the first > place? [5] JPA via OpenJPA and Eclipselink has been surprisingly hard to get > working in an OSGi runtime. They have trouble with the dynamic nature of > bundles. Exploring JDO at the moment, but it too feels like swimming > upstream. Current Thinking: this is familiar, but newer technologies have > shown us that one size no longer fits all. > > [1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions > [2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql > [3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb > [4] > http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/ > [5] http://www.couchbase.com/ > [6] http://www.hibernate.org/subprojects/ogm.html > > _______________________________________________ > oae-dev mailing list > [email protected] > http://collab.sakaiproject.org/mailman/listinfo/oae-dev
_______________________________________________ oae-dev mailing list [email protected] http://collab.sakaiproject.org/mailman/listinfo/oae-dev
