Re: Flume workflow design

Wolfgang Hoschek Thu, 18 Jul 2013 15:52:51 -0700

Take a look at these options:

- HBase Sinks (send data into HBase):


        http://flume.apache.org/FlumeUserGuide.html#hbasesinks

- Apache Flume Morphline Solr Sink (for heavy duty ETL processing and ingestion 
into Solr): 

        http://flume.apache.org/FlumeUserGuide.html#morphlinesolrsink

- Apache Flume MorphlineInterceptor (for light-weight event annotations and 
routing): 

        http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor

- For MapReduce jobs it is typically more straightforward and efficient to send 
data directly to destinations, i.e. without going through Flume. For example 
using the MapReduceIndexerTool when going from HDFS into Solr: 

        https://github.com/cloudera/search/tree/master/search-mr

Wolfgang.

On Jul 18, 2013, at 3:37 PM, Flavio Pompermaier wrote:

> Hi to all,
> 
> I'm new to Flume but I'm very excited about it!
> I'd like to use it to gather some data, process received messages and then 
> indexing to solr.
> Any suggestion about how to do that with Flume?
> I've already tested an Avro source that sends data to HBase,
> but my use case requires those messages to be saved in HBase but also 
> processed and then indexed in Solr (obviously I also need to convert the 
> object structure to convert them).
> I think the first part is quite simple (I just use 2 sinks, one that store in 
> HBase) and another one that forward to another Avro instance, right?
> If messages are sent during a map/reduce job, is the avro source the best 
> option to send documents to index to my sink (i.e. that is my first part of 
> the flow that up to now I simulated with an avro source..)?
> Best,
> Flavio
> 
> 
>

Re: Flume workflow design

Reply via email to