want to add bit more, i am posting the json data using kafka-console-produer.sh file, copy the json data and pasting on console.
On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <[email protected]> wrote: > Thanks Naveen for response, i was using mobile so couldn't see typo's. > here is my requirement. this is my first POC on Kafka/Storm, so please help > me if i can design it better way. > > i need to read a Json data from Kafka, than convert the Json Data to CSV > file and save it on HDFS. > > this is how i did initial design and having lot of issues. > > builder.setSpout("kafka-spout", new KafkaSpout(kafkaSpoutConfig)); > builder.setBolt("TransformBolt", new > TransformationBolt()).shuffleGrouping("kafka-spout"); > builder.setBolt("Savebolt", new > SaveBolt()).shuffleGrouping("TransformBolt"); > > KafkaSpout to read the data from Kafka topic, TransformationBolt to > convert the json to cvs file and savebolt is to save the csv file. > > KafkaSpout was able to read data from Kafka Topic. what i was expecting > from Spout was to get the complete Json data but i am getting 1 line each > from Json data i sent to topic > > here is my transport bolt > @Override > public void execute(Tuple input) { > String sentence = input.getString(0); > collector.emit(new Values(sentence)); > System.out.println("emitted " + sentence); > } > > i was expecting getString(0) would return complete json data, but getting > only 1 line at once. > > and i am not sure how to emit csv file so that Savebolt would save it. > > can you please let me know how to get complete Json data in single request > rather than line by line, how to emit CSV file from bolt. and if you guys > can help me to design this better it would be really helpful > > > On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe <[email protected] > > wrote: > >> Dear Praveen, >> >> The questions aren't silly, but it is rather tough to understand what you >> are trying to convey. When you say "omit", do you mean "emit"? >> Bolts can emit data even without having to write to disk (I think there's >> a 2MB limit to the size of that data that can be emitted, because Thrift >> can't handle more than that). >> If you want one bolt to write to disk and then want another bolt to read >> from disk, then that's also possible. >> The first bolt can just send to the second bolt, whatever information is >> necessary to read from file. >> As of what I know, basic datatypes will automatically get serialized. If >> you have a more complex class, then serialize it with Serializable. >> >> If you could re-phrase your question and make it clearer, people here >> would be able to help you better. >> >> >> >> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy < >> [email protected]> wrote: >> >>> Hi All, >>> >>> i am new to Storm and Kafka and working on POC. >>> >>> my requirement is get a message from Kafka in json format, spout reading >>> that message and firts bolt converting the json message to different format >>> like csv and the second bolt saving it to hadoop. >>> >>> now i came up with initial design where i can use kafkaspout to read >>> kafka topics and bolt converting it to csv file and next bolt saving in >>> hadoop. >>> >>> i have following questions >>> can the first bold which coverts the message to csv file can omit it? >>> the file would be saving on disk. can a file which is saved on disk can be >>> omitted. >>> how does the second bolt read the file which is saved on disk by first >>> bolt? >>> do we need to serialize message ommitted by spout and/or bolt? >>> >>> sorry if the questions sound silly, this is my first topology with >>> minimum knowledge of storm. >>> >>> if you guys think of proper design how to implement the my requirement >>> can you please let me know >>> >>> thanks in advance >>> >>> -Praveen >>> >> >> >> >> -- >> Regards, >> Navin >> > >
