Re: HDFS sink: "clever" routing

Gwen Shapira Wed, 15 Oct 2014 08:57:46 -0700

Yes, this is absolutely possible - but you need to make sure the flume
event has the matching keys in the event header (tenant, type, and
timestamp).
Do this either using interceptors or through a custom source.


On Wed, Oct 15, 2014 at 7:02 AM, Jean-Philippe Caruana
<[email protected]> wrote:
> Hi,
>
> I am new to Flume (and to HDFS), so I hope my question is not stupid.
>
> I have a multi-tenant application (about 100 different customers as for
> now).
> I have 16 different data types.
>
> (In production, we have approx. 15 million messages/day through our
> RabbitMQ)
>
> I want to write to HDFS all my events, separated by tenant, data type,
> and date, like this :
> /data/{tenant}/{data_type}/2014/10/15/file-08.csv
>
> Is it possible with one sink definition ? I don't want to duplicate
> configuration, and new client arrive every week or so
>
> In documentation, I see
> agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%Y/%m/%d/%H/
>
> Is this possible ?
> agent1.sinks.hdfs-sink1.hdfs.path =
> hdfs://server/events/%tenant/%type/%Y/%m/%d/%H/
>
> I want to write to different folder according to my incoming data.
>
> Thanks
>
> --
> Jean-Philippe Caruana
> http://www.barreverte.fr
>

Re: HDFS sink: "clever" routing

Reply via email to