Summary ======= To help improve performance of both the technology and our team, we are evaluating adoption of a new storage subsystem. A number of solutions have been evaluated and we have come to some initial conclusions: Eliminated from consideration Infinispan Neo4J SenseiDB Voldemort Still under investigation Cassandra MongoDB Relational DB / JPA / JDO.
We want to emphasize that any technology we adopt will be a transition over time, so that we can maintain stability in the application. Background ========= Our Ann Arbor meeting last month was about thinking through architectural changes that could improve OAE in terms of system performance and team performance, as well as laying the groundwork for taking measurements (modeling a production-like data set, provisioning load testing infrastructure, and the load tests themselves). For team performance, we agreed that we should strive to rely more heavily on established low-level infrastructure for storage. In other words, find a storage subsystem and API that we don't need to maintain ourselves. This is a fertile time for storage technology, but that also means there are many options to sift through, each with its own quirks and tradeoffs. We set the following criteria for our search (in no particular order): * ease of use for developers (APIs, etc.) * ease of use for deployers (backups, failover, monitoring, etc.) * strength of the community * suitable license (ECL2 compatible) * proven track record (success stories in applications somewhat like ours) * options for queries * options for scaling * options for integrity (atomcity, consistency, transactions, referential integrity) In the weeks since our meeting, the server devs have explored various options. I'd like to summarize our progress so far. In the interests of brevity, we won't include every detail. We invite your questions and feedback. Note that we're right in the middle of our investigation, not at the end. Hopefully, we'll get some more time with OmniTI to talk about what we've learned. Infinispan - Infinispan is a successor to JBoss Cache. It is simply a caching layer that has the ability to persist via a configurable cache-store. Strengths: high level configurability for things like transactions, write-behind/write-through persistence, high volume of community activity, and storage agnosticism. Weaknesses: Work with Infinispan showed promise with respect to operating inside an OSGi container, however when trying to persist POJO data, it became obvious that another library such as Hibernate OGM [6] would be necessary to make persistence of POJOs through Infinispan possible, and there just doesn't seem to be anything mature enough, or well documented enough that we could start building off of. Current Thinking: Infinispan, while being a pretty mature memory grid and caching layer, seems a little premature to start thinking of as a full-fledged domain persistence layer. Voldemort -- a project that comes out of LinkedIn. It is a distributed key-value store with sophisticated horizontal scaling using a ring topology similar to Cassandra's. Strengths: speed, elastic scaling, runs in the JVM, supports various forms of serialization, including JSON and Google's protobuf. Weaknesses: no support for querying, so we'd have to write separate synchronous indexing using Lucene, and all the glue code to make them work together. Not an active, diverse community. Current Thinking: too much work to get basic store-and-find operations. MongoDB -- a document-oriented database with a very developer-friendly API. Managed and backed commercially by 10gen. Strengths: really easy to use. You can store JSON documents, which is just what we want to do, and you can create indexes and query like we're used to from the relational DB world. Huge community, probably the most active in the NoSQL space. Tools, hosting options, the works. Weaknesses: We've seen the same story a number of times [1][2][3][4]: everyone loves MongoDB at first, but it becomes operationally painful in production. Scaling it is complex. Current Thinking: The pain might be worth it, but it certainly gives us pause. This is probably a product that is going to be much easier to manage when it matures. Cassandra -- a column family database, borrowing ideas from Google BigTable and Amazon DynamoDB. Originated at Facebook, but since it moved to the Apache Foundation, it has taken on a life of its own. Commercially backed by DataStax. Strengths: very good replication and scaling technology (a ring topology, like Voldemort). Supports queries, but you have to plan for them in your data model. Very strong community. Consistency tunable per-request. Weaknesses: steeper learning curve for devs. Data modeling in Cassandra is a different paradigm from the relational databases we know. Current Thinking: this is attractive for its power, but it will take work to get everybody up to speed on it. In a sense, it's the opposite of the MongoDB story: harder to get started with, but very satisfying in the long term. See [1] for more on this. JPA/JDO with a relational database -- This is the technology we're familiar with from projects past. This is the tried and true relational model with tables, ORM, and sometimes SQL. Strengths: Everyone knows how this works. There are plenty of tools, plenty of commercial support, and you can write JOINs! Weaknesses: vertical scaling. When you reach the limit of the hardware you can throw at your database server, you can try sharding, which is notoriously difficult, or something like memcached, but then you're committing to key-value semantics, so why not just go there in the first place? [5] JPA via OpenJPA and Eclipselink has been surprisingly hard to get working in an OSGi runtime. They have trouble with the dynamic nature of bundles. Exploring JDO at the moment, but it too feels like swimming upstream. Current Thinking: this is familiar, but newer technologies have shown us that one size no longer fits all. [1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions [2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql [3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb [4] http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/ [5] http://www.couchbase.com/ [6] http://www.hibernate.org/subprojects/ogm.html
_______________________________________________ oae-dev mailing list [email protected] http://collab.sakaiproject.org/mailman/listinfo/oae-dev
