Hi All,

here is my use case

I have a spout which reads JSON data from Kafka Topic, after that spout
will emit the json string to Bolt1. Bolt1 will enrich the data, based on
particular field in the json string, Bolt1 will create 3 different files.
Once any of the file reaches 25 mb size, bolt1 will start emitting the data
from that file. Bolt1 will emit the data to Bolt 2 which is HDFSBolt.
HDFSBolt will store the data onto the file system.



Now the problem, how does HDFS bolt knows from which file Bolt1 is emitting
the data. I need to know whether Bolt1 is emitting the data File1 or File2
or File3. Based on the file, I need to save it in different directory on
the HDFS system.



I was wondering whether I can apply field grouping on HDFS bolt, but I am
not sure how to do it while emitting the data from file. Is there a way I
can do it. Or any other solution which would accomplish this requirement.

Thanks,
Praveen

Reply via email to