Hi All, here is my use case
I have a spout which reads JSON data from Kafka Topic, after that spout will emit the json string to Bolt1. Bolt1 will enrich the data, based on particular field in the json string, Bolt1 will create 3 different files. Once any of the file reaches 25 mb size, bolt1 will start emitting the data from that file. Bolt1 will emit the data to Bolt 2 which is HDFSBolt. HDFSBolt will store the data onto the file system. Now the problem, how does HDFS bolt knows from which file Bolt1 is emitting the data. I need to know whether Bolt1 is emitting the data File1 or File2 or File3. Based on the file, I need to save it in different directory on the HDFS system. I was wondering whether I can apply field grouping on HDFS bolt, but I am not sure how to do it while emitting the data from file. Is there a way I can do it. Or any other solution which would accomplish this requirement. Thanks, Praveen
