On Fri, Apr 16, 2010 at 10:04 AM, Colin Surprenant <[email protected]>wrote:
> > I definitely have to do some experimentation with this idea and see if > adding such triggers directly from map functions can have a > significant impact on the data post-processing throughput out of the > mapreduce framework. Another option, in line with your separate OTP > application suggestion, would be to use intermediate queuing > (rabbitmq, redis, ...) and just queue results which would be picked up > by another external process in charge of feeding into the > elasticsearch indexer. The process can then be tuned independently to > parallelize documents inserts and optimize this for your specific > elasticsearch cloud characteristics. > > I think this approach could be more efficient, in the case of very > large result sets, than doing a simple result set aggregation and > re-feeding. There is also the option of chunked/streaming results set > to consider. In the same line of thoughts I could just setup a > listener on the result stream and feed it back into the intermediate > queuing. > We're prototyping an ingest framework using RabbitMQ and RabbitHub (PubSubHubBub & Webhooks) for a client. The idea is to allow internal applications to publish documents and then have various back-ends subscribe to those syndications via Webhooks using RabbitHub. We've seen promising results but we still don't use this to "bootstrap" any systems. The problem we're trying to solve is to reduce all the point-to-point communication (decouple things), eliminate batch oriented behavior (near real-time trickle feeding), and provide some sense of durability when a backend is offline for any reason (apps just keep feeding). Check out RabbitHub which provides a general implementation of PubSubHubBub (more than just Atom). Then you can just use a Webhook to have any new documents published to the broker automatically published to ES by adding ES as a subscriber. Very cool stuff http://github.com/tonyg/rabbithub Regards, -Eric
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
