With such an important decision ahead of us, could someone please articulate 
how the decision will be made?  And in what time frame?

Thanks, L


On Jun 28, 2012, at 3:50 PM, Zach A. Thomas <[email protected]> wrote:

> Summary
> =======
> 
> To help improve performance of both the technology and our team, we are 
> evaluating adoption of a new storage subsystem.  A number of solutions have 
> been evaluated and we have come to some initial conclusions:
> Eliminated from consideration
> Infinispan
> Neo4J
> SenseiDB
> Voldemort
> Still under investigation
> Cassandra
> MongoDB
> Relational DB / JPA / JDO.
> 
> We want to emphasize that any technology we adopt will be a transition over 
> time, so that we can maintain stability in the application.
> 
> Background
> =========
> 
> Our Ann Arbor meeting last month was about thinking through architectural 
> changes that could improve OAE in terms of system performance and team 
> performance, as well as laying the groundwork for taking measurements 
> (modeling a production-like data set, provisioning load testing 
> infrastructure, and the load tests themselves). For team performance, we 
> agreed that we should strive to rely more heavily on established low-level 
> infrastructure for storage. In other words, find a storage subsystem and API 
> that we don't need to maintain ourselves. This is a fertile time for storage 
> technology, but that also means there are many options to sift through, each 
> with its own quirks and tradeoffs.
> 
> We set the following criteria for our search (in no particular order):
> 
> * ease of use for developers (APIs, etc.)
> * ease of use for deployers (backups, failover, monitoring, etc.)
> * strength of the community
> * suitable license (ECL2 compatible)
> * proven track record (success stories in applications somewhat like ours)
> * options for queries
> * options for scaling
> * options for integrity (atomcity, consistency, transactions, referential 
> integrity)
> 
> In the weeks since our meeting, the server devs have explored various 
> options. I'd like to summarize our progress so far. In the interests of 
> brevity, we won't include every detail. We invite your questions and 
> feedback. Note that we're right in the middle of our investigation, not at 
> the end. Hopefully, we'll get some more time with OmniTI to talk about what 
> we've learned.
> 
> Infinispan - Infinispan is a successor to JBoss Cache. It is simply a caching 
> layer that has the ability to persist via a configurable cache-store. 
> Strengths: high level configurability for things like transactions, 
> write-behind/write-through persistence, high volume of community activity, 
> and storage agnosticism. Weaknesses: Work with Infinispan showed promise with 
> respect to operating inside an OSGi container, however when trying to persist 
> POJO data, it became obvious that another library such as Hibernate OGM [6] 
> would be necessary to make persistence of POJOs through Infinispan possible, 
> and there just doesn't seem to be anything mature enough, or well documented 
> enough that we could start building off of. Current Thinking: Infinispan, 
> while being a pretty mature memory grid and caching layer, seems a little 
> premature to start thinking of as a full-fledged domain persistence layer.
> 
> Voldemort -- a project that comes out of LinkedIn. It is a distributed 
> key-value store with sophisticated horizontal scaling using a ring topology 
> similar to Cassandra's. Strengths: speed, elastic scaling, runs in the JVM, 
> supports various forms of serialization, including JSON and Google's 
> protobuf. Weaknesses: no support for querying, so we'd have to write separate 
> synchronous indexing using Lucene, and all the glue code to make them work 
> together. Not an active, diverse community. Current Thinking: too much work 
> to get basic store-and-find operations.
> 
> MongoDB -- a document-oriented database with a very developer-friendly API. 
> Managed and backed commercially by 10gen. Strengths: really easy to use. You 
> can store JSON documents, which is just what we want to do, and you can 
> create indexes and query like we're used to from the relational DB world. 
> Huge community, probably the most active in the NoSQL space. Tools, hosting 
> options, the works. Weaknesses: We've seen the same story a number of times 
> [1][2][3][4]: everyone loves MongoDB at first, but it becomes operationally 
> painful in production. Scaling it is complex. Current Thinking: The pain 
> might be worth it, but it certainly gives us pause. This is probably a 
> product that is going to be much easier to manage when it matures.
> 
> Cassandra -- a column family database, borrowing ideas from Google BigTable 
> and Amazon DynamoDB. Originated at Facebook, but since it moved to the Apache 
> Foundation, it has taken on a life of its own. Commercially backed by 
> DataStax. Strengths: very good replication and scaling technology (a ring 
> topology, like Voldemort). Supports queries, but you have to plan for them in 
> your data model. Very strong community. Consistency tunable per-request. 
> Weaknesses: steeper learning curve for devs. Data modeling in Cassandra is a 
> different paradigm from the relational databases we know. Current Thinking: 
> this is attractive for its power, but it will take work to get everybody up 
> to speed on it. In a sense, it's the opposite of the MongoDB story: harder to 
> get started with, but very satisfying in the long term. See [1] for more on 
> this.
> 
> JPA/JDO with a relational database -- This is the technology we're familiar 
> with from projects past. This is the tried and true relational model with 
> tables, ORM, and sometimes SQL. Strengths: Everyone knows how this works. 
> There are plenty of tools, plenty of commercial support, and you can write 
> JOINs! Weaknesses: vertical scaling. When you reach the limit of the hardware 
> you can throw at your database server, you can try sharding, which is 
> notoriously difficult, or something like memcached, but then you're 
> committing to key-value semantics, so why not just go there in the first 
> place? [5] JPA via OpenJPA and Eclipselink has been surprisingly hard to get 
> working in an OSGi runtime. They have trouble with the dynamic nature of 
> bundles. Exploring JDO at the moment, but it too feels like swimming 
> upstream. Current Thinking: this is familiar, but newer technologies have 
> shown us that one size no longer fits all.
> 
> [1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
> [2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql
> [3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
> [4] 
> http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/
> [5] http://www.couchbase.com/
> [6] http://www.hibernate.org/subprojects/ogm.html
> 
> _______________________________________________
> oae-dev mailing list
> [email protected]
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to