Ok, that's fair (you're basically describing a CQRS-style architecture). But I don't see anything in that chain of processing that requires your individual Map and Reduce functions to be written in PHP or Python. The munging you describe would probably be best done outside Riak, either via a custom system, or via something like Hadoop's streaming interface. One thing that might be of interest to you is Disco, which is written primarily in Python with some bits of Erlang, and is an alternative to Hadoop.
Sean Cribbs <[email protected]> Developer Advocate Basho Technologies, Inc. http://basho.com/ On Mar 15, 2011, at 4:28 AM, Ishwar wrote: > Sean, > > The use-case that we're looking at is a bit more complicated than that. > Briefly, this is what we want to do. > > 1. We get a whole bunch of data, say, blog posts from various sources which > we index in Solr, and store in Riak in json format. > > 2. Once the data is in riak, we need to run a whole bunch of analysis on > selected groups of records. The scripts to do this analysis are in PHP and > Python. The idea is to run MapReduce on a batch of records, and update Solr > with the results of the analysis. On Riak, the results of the analysis will > be updated on a different bucket, with links to the original record. > > > 3. At the serving end, it's going to be just key-value pair retrievals, or > simple MapReduce. > > Pre-processing the data is not an option as we won't be running this analysis > on all the records. It will be run only on a subset of data. > > Given these use-case, what do you suggest is the best way to use Riak? > > > -- > Thanks, > Ishwar > > > > > ----- Original Message ----- >> From:Sean Cribbs <[email protected]> >> To:Ishwar <[email protected]> >> Cc:"[email protected]" <[email protected]> >> Sent:Monday, March 14, 2011 8:57 PM >> Subject:Re: Riak n00b questions >> >> >>>> It is not currently, but we are looking into the feasibility of >> supporting other languages. However, I might say that if you're already >> doing Python and PHP, it would be worth your while (and not difficult) to >> learn >> JavaScript. >>> >>> We already have a whole bunch of processing on the data written in Python >> and PHP, and porting them to Javascript is (1) very tedious, and (2) >> Javascript >> does not support the required functionality. For example, we do a bunch of >> NLP >> analysis on the data. >>> >>> Given these, is it advisable if I expose these processes as webservices and >> call them from javascript/erlang? >>> >> >> The other option of course, is to pre-process your data and just insert >> multiple >> copies in different formats, which is a pretty common pattern. The tradeoff >> is >> whether you want to pay the cost at query time or at write time. If you can >> pay >> that cost up-front, reads will likely be key-value or very simple MapReduce >> and >> thus very fast. >> >> Sean Cribbs <[email protected]> >> Developer Advocate >> Basho Technologies, Inc. >> http://basho.com/ > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
