Check out:

http://github.com/kungfooguru/erlastic_search

Which is an Erlang ElasticSearch client. You may also want to look at the
pre/post commit hooks in Riak because they offer a nice integration point
(as opposed to building the index all at once).

Keep in mind that regardless, HTTP has a high overhead when it comes to
message passing semantics. We've achieved indexing speeds of about 10K
docs/sec (over HTTP) but we *always* using batching. I don't think ES has a
batch submission endpoint. In this case SOLR may be a better fit because
your map phase can build/submit batches and reduce the number of HTTP calls
(batching will no doubt provide better performance than mere parallelism of
single document submissions).

Of course you'll have to partition the data yourself which can be quite a
pain when you want to grow/shrink the Solr cluster (i.e., no elasticity).
Adding a batch API to ES shouldn't be too difficult. You might want to hit
Shay up and see what he says.

Of course you could sit tight and wait for Riak Search.

-Eric


On Fri, Apr 16, 2010 at 9:06 AM, Kevin Smith <[email protected]> wrote:

> The only restrictions on map/reduce functions are a) they must return lists
> and b) the entire job must execute before the timeout period elapses (60
> seconds is the default).  Javascript functions have the additional
> restriction of not being able to call back into Erlang code due to the
> current state of Erlang/Javascript integration.
>
> The easiest way to do this would be to write your function in Erlang and
> use either httpc (packaged with Erlang) or ibrowse (
> http://github.com/cmullaparthi/ibrowse) to make the HTTP calls. If you're
> comfortable with Erlang & OTP I'd recommend making a separate OTP
> application to handle the HTTP calls and provide an API for your map/reduce
> functions to use This design moves the HTTP calls out of the query flow and
> prevents a hanging HTTP call from timing out a query.
>
> --Kevin
> On Apr 15, 2010, at 11:20 PM, Colin Surprenant wrote:
>
> > I'll rephrase my question:
> >
> > Is it possible to call external http services from a map function in
> > JavaScript and/or Erlang? Any comments/pointers appreciated.
> >
> > Thanks,
> > Colin
> >
> > On Thu, Apr 15, 2010 at 12:44 PM, Colin Surprenant <[email protected]>
> wrote:
> >> Hi,
> >>
> >> I am trying to figure what the fastest way would be to send a
> >> mapreduce result set for indexing into a searchengine system like
> >> elasticsearch.
> >>
> >> Of course, the trivial way to do it would be to simply gather the
> >> result set and push it back into the indexer using their http/rest
> >> api.
> >>
> >> Now, elasticsearch is distributed by nature and will allow parallel
> >> queries for document insertion for indexing. One way to leverage this
> >> would be to actually directly push a document from within a map
> >> function into the indexer using their rest api. This would completely
> >> distribute the index creation process and leverage the parallelism of
> >> elasticsearch.
> >>
> >> Would this be possible?
> >>
> >> Is this something I could do using the JavaScript mapreduce? and/or
> Erlang?
> >>
> >> Thanks,
> >> Colin
> >>
> >
> > _______________________________________________
> > riak-users mailing list
> > [email protected]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to