Re: [oae-dev] OAE Storage Investigation Update

Steve Swinsburg Fri, 20 Jul 2012 14:12:47 -0700

Is it really worth thinking in terms of scaling to hundreds of millions of 
users anymore? Do we really see a use case for that? 
We are building a learning environment for a university or institution, not the 
next global social network.


I've watched this project right from its earliest beginnings, and this rewrite 
discussion is too familiar. IMVHO opinion it's time to settle on the most tried 
and true solution for the backend.

Steve


Sent from my iPhone

On 21/07/2012, at 6:50, Lance Speelmon <[email protected]> wrote:

> With such an important decision ahead of us, could someone please articulate 
> how the decision will be made?  And in what time frame?
> 
> Thanks, L
> 
> 
> On Jun 28, 2012, at 3:50 PM, Zach A. Thomas <[email protected]> wrote:
> 
>> Summary
>> =======
>> 
>> To help improve performance of both the technology and our team, we are 
>> evaluating adoption of a new storage subsystem.  A number of solutions have 
>> been evaluated and we have come to some initial conclusions:
>> Eliminated from consideration
>> Infinispan
>> Neo4J
>> SenseiDB
>> Voldemort
>> Still under investigation
>> Cassandra
>> MongoDB
>> Relational DB / JPA / JDO.
>> 
>> We want to emphasize that any technology we adopt will be a transition over 
>> time, so that we can maintain stability in the application.
>> 
>> Background
>> =========
>> 
>> Our Ann Arbor meeting last month was about thinking through architectural 
>> changes that could improve OAE in terms of system performance and team 
>> performance, as well as laying the groundwork for taking measurements 
>> (modeling a production-like data set, provisioning load testing 
>> infrastructure, and the load tests themselves). For team performance, we 
>> agreed that we should strive to rely more heavily on established low-level 
>> infrastructure for storage. In other words, find a storage subsystem and API 
>> that we don't need to maintain ourselves. This is a fertile time for storage 
>> technology, but that also means there are many options to sift through, each 
>> with its own quirks and tradeoffs.
>> 
>> We set the following criteria for our search (in no particular order):
>> 
>> * ease of use for developers (APIs, etc.)
>> * ease of use for deployers (backups, failover, monitoring, etc.)
>> * strength of the community
>> * suitable license (ECL2 compatible)
>> * proven track record (success stories in applications somewhat like ours)
>> * options for queries
>> * options for scaling
>> * options for integrity (atomcity, consistency, transactions, referential 
>> integrity)
>> 
>> In the weeks since our meeting, the server devs have explored various 
>> options. I'd like to summarize our progress so far. In the interests of 
>> brevity, we won't include every detail. We invite your questions and 
>> feedback. Note that we're right in the middle of our investigation, not at 
>> the end. Hopefully, we'll get some more time with OmniTI to talk about what 
>> we've learned.
>> 
>> Infinispan - Infinispan is a successor to JBoss Cache. It is simply a 
>> caching layer that has the ability to persist via a configurable 
>> cache-store. Strengths: high level configurability for things like 
>> transactions, write-behind/write-through persistence, high volume of 
>> community activity, and storage agnosticism. Weaknesses: Work with 
>> Infinispan showed promise with respect to operating inside an OSGi 
>> container, however when trying to persist POJO data, it became obvious that 
>> another library such as Hibernate OGM [6] would be necessary to make 
>> persistence of POJOs through Infinispan possible, and there just doesn't 
>> seem to be anything mature enough, or well documented enough that we could 
>> start building off of. Current Thinking: Infinispan, while being a pretty 
>> mature memory grid and caching layer, seems a little premature to start 
>> thinking of as a full-fledged domain persistence layer.
>> 
>> Voldemort -- a project that comes out of LinkedIn. It is a distributed 
>> key-value store with sophisticated horizontal scaling using a ring topology 
>> similar to Cassandra's. Strengths: speed, elastic scaling, runs in the JVM, 
>> supports various forms of serialization, including JSON and Google's 
>> protobuf. Weaknesses: no support for querying, so we'd have to write 
>> separate synchronous indexing using Lucene, and all the glue code to make 
>> them work together. Not an active, diverse community. Current Thinking: too 
>> much work to get basic store-and-find operations.
>> 
>> MongoDB -- a document-oriented database with a very developer-friendly API. 
>> Managed and backed commercially by 10gen. Strengths: really easy to use. You 
>> can store JSON documents, which is just what we want to do, and you can 
>> create indexes and query like we're used to from the relational DB world. 
>> Huge community, probably the most active in the NoSQL space. Tools, hosting 
>> options, the works. Weaknesses: We've seen the same story a number of times 
>> [1][2][3][4]: everyone loves MongoDB at first, but it becomes operationally 
>> painful in production. Scaling it is complex. Current Thinking: The pain 
>> might be worth it, but it certainly gives us pause. This is probably a 
>> product that is going to be much easier to manage when it matures.
>> 
>> Cassandra -- a column family database, borrowing ideas from Google BigTable 
>> and Amazon DynamoDB. Originated at Facebook, but since it moved to the 
>> Apache Foundation, it has taken on a life of its own. Commercially backed by 
>> DataStax. Strengths: very good replication and scaling technology (a ring 
>> topology, like Voldemort). Supports queries, but you have to plan for them 
>> in your data model. Very strong community. Consistency tunable per-request. 
>> Weaknesses: steeper learning curve for devs. Data modeling in Cassandra is a 
>> different paradigm from the relational databases we know. Current Thinking: 
>> this is attractive for its power, but it will take work to get everybody up 
>> to speed on it. In a sense, it's the opposite of the MongoDB story: harder 
>> to get started with, but very satisfying in the long term. See [1] for more 
>> on this.
>> 
>> JPA/JDO with a relational database -- This is the technology we're familiar 
>> with from projects past. This is the tried and true relational model with 
>> tables, ORM, and sometimes SQL. Strengths: Everyone knows how this works. 
>> There are plenty of tools, plenty of commercial support, and you can write 
>> JOINs! Weaknesses: vertical scaling. When you reach the limit of the 
>> hardware you can throw at your database server, you can try sharding, which 
>> is notoriously difficult, or something like memcached, but then you're 
>> committing to key-value semantics, so why not just go there in the first 
>> place? [5] JPA via OpenJPA and Eclipselink has been surprisingly hard to get 
>> working in an OSGi runtime. They have trouble with the dynamic nature of 
>> bundles. Exploring JDO at the moment, but it too feels like swimming 
>> upstream. Current Thinking: this is familiar, but newer technologies have 
>> shown us that one size no longer fits all.
>> 
>> [1] http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
>> [2] http://w3matter.com/blog/from-postgresql-to-mongodb-back-to-postgresql
>> [3] http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
>> [4] 
>> http://e1ven.com/2011/11/07/my-experiences-with-mongodb-over-the-last-year-in-production/
>> [5] http://www.couchbase.com/
>> [6] http://www.hibernate.org/subprojects/ogm.html
>> 
>> _______________________________________________
>> oae-dev mailing list
>> [email protected]
>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
> 
> _______________________________________________
> oae-dev mailing list
> [email protected]
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev

_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Re: [oae-dev] OAE Storage Investigation Update

Reply via email to