Yes, that will work fine. From experience, I can say definitely account for the possibility of the 'tenant' and 'data_type' headers being corrupted or missing outright.
At my org we have a similar setup where we auto-bucket on a 'logSubType' header that our application adds to the initial flume event. To keep channels from blocking if this header goes missing we have a static interceptor that adds the value 'MissingSubType' if the header does not exist. This setup has worked well for us across dozens of separate log streams for over a year. Hope that helps, Paul Chavez -----Original Message----- From: Jean-Philippe Caruana [mailto:[email protected]] Sent: Wednesday, October 15, 2014 7:03 AM To: [email protected] Subject: HDFS sink: "clever" routing Hi, I am new to Flume (and to HDFS), so I hope my question is not stupid. I have a multi-tenant application (about 100 different customers as for now). I have 16 different data types. (In production, we have approx. 15 million messages/day through our RabbitMQ) I want to write to HDFS all my events, separated by tenant, data type, and date, like this : /data/{tenant}/{data_type}/2014/10/15/file-08.csv Is it possible with one sink definition ? I don't want to duplicate configuration, and new client arrive every week or so In documentation, I see agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%Y/%m/%d/%H/ Is this possible ? agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%tenant/%type/%Y/%m/%d/%H/ I want to write to different folder according to my incoming data. Thanks -- Jean-Philippe Caruana http://www.barreverte.fr
