Hi, You can follow the link for instructions on using HDFS bolt. http://storm.apache.org/releases/1.0.0/storm-hdfs.html
Thanks, Satish. On Tue, Jul 5, 2016 at 3:02 AM, praveen reddy <[email protected]> wrote: > thanks for response, can you please help me on how can i emit csv data > using bolt. i was able to read json data from Kafka, convert the data into > java object. i created a utility class to convert java object into csv > file. now i want to write that csv file (which i stored on disk) onto hdfs > using bolt. any link to documentation on how to do it would be helpful. i > did search in google but couldn't find relevant info. > > On Mon, Jul 4, 2016 at 5:22 PM, Harsha Chintalapani <[email protected]> > wrote: > >> >> “Bolts can emit data even without having to write to disk (I think >> there’s a 2MB limit to the size of that data that can be emitted, because >> Thrift can’t handle more than that)." >> There is no such limit. Between workers storm uses netty channels and >> internal JVM component communication happens through distuptor queue. >> If one needs to increase the size of buffers for netty take a look at >> netty configs in storm.yaml. We recommend to go with the defaults. >> Thanks, >> Harsha >> >> On Mon, Jul 4, 2016 at 9:59 AM Nathan Leung <[email protected]> wrote: >> >>> Double check how you are pushing data into Kafka. You are probably >>> pushing one line at a time. >>> On Jul 4, 2016 12:30 PM, "Navin Ipe" <[email protected]> >>> wrote: >>> >>>> I haven't worked with Kafka, *so perhaps someone else here would be >>>> able to help you with it. * >>>> What I could suggest though, is to search for how to emit more than one >>>> sentence using the Kafka spout. >>>> >>>> If you still can emit only one sentence, then I'd recommend not using a >>>> separate SaveBolt. Instead, use FieldsGrouping where you group tuples based >>>> on the name of the CSV file, and emit sentences to TransformBolt. When >>>> TransformBolt completes receiving all tuples from a CSV, it can save to >>>> HDFS. >>>> >>>> If you still want to use a separate TransformBolt and SaveBolt, then >>>> use fields grouping as I mentioned above when emitting to both bolts. This >>>> way, you can have multiple spouts which read from multiple files, and >>>> whatever they emit will go only to specific bolts. >>>> >>>> >>>> On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <[email protected]> >>>> wrote: >>>> >>>>> want to add bit more, >>>>> i am posting the json data using kafka-console-produer.sh file, copy >>>>> the json data and pasting on console. >>>>> >>>>> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <[email protected] >>>>> > wrote: >>>>> >>>>>> Thanks Naveen for response, i was using mobile so couldn't see >>>>>> typo's. here is my requirement. this is my first POC on Kafka/Storm, so >>>>>> please help me if i can design it better way. >>>>>> >>>>>> i need to read a Json data from Kafka, than convert the Json Data to >>>>>> CSV file and save it on HDFS. >>>>>> >>>>>> this is how i did initial design and having lot of issues. >>>>>> >>>>>> builder.setSpout("kafka-spout", new >>>>>> KafkaSpout(kafkaSpoutConfig)); >>>>>> builder.setBolt("TransformBolt", new >>>>>> TransformationBolt()).shuffleGrouping("kafka-spout"); >>>>>> builder.setBolt("Savebolt", new >>>>>> SaveBolt()).shuffleGrouping("TransformBolt"); >>>>>> >>>>>> KafkaSpout to read the data from Kafka topic, TransformationBolt to >>>>>> convert the json to cvs file and savebolt is to save the csv file. >>>>>> >>>>>> KafkaSpout was able to read data from Kafka Topic. what i was >>>>>> expecting from Spout was to get the complete Json data but i am getting 1 >>>>>> line each from Json data i sent to topic >>>>>> >>>>>> here is my transport bolt >>>>>> @Override >>>>>> public void execute(Tuple input) { >>>>>> String sentence = input.getString(0); >>>>>> collector.emit(new Values(sentence)); >>>>>> System.out.println("emitted " + sentence); >>>>>> } >>>>>> >>>>>> i was expecting getString(0) would return complete json data, but >>>>>> getting only 1 line at once. >>>>>> >>>>>> and i am not sure how to emit csv file so that Savebolt would save it. >>>>>> >>>>>> can you please let me know how to get complete Json data in single >>>>>> request rather than line by line, how to emit CSV file from bolt. and if >>>>>> you guys can help me to design this better it would be really helpful >>>>>> >>>>>> >>>>>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Dear Praveen, >>>>>>> >>>>>>> The questions aren't silly, but it is rather tough to understand >>>>>>> what you are trying to convey. When you say "omit", do you mean "emit"? >>>>>>> Bolts can emit data even without having to write to disk (I think >>>>>>> there's a 2MB limit to the size of that data that can be emitted, >>>>>>> because >>>>>>> Thrift can't handle more than that). >>>>>>> If you want one bolt to write to disk and then want another bolt to >>>>>>> read from disk, then that's also possible. >>>>>>> The first bolt can just send to the second bolt, whatever >>>>>>> information is necessary to read from file. >>>>>>> As of what I know, basic datatypes will automatically get >>>>>>> serialized. If you have a more complex class, then serialize it with >>>>>>> Serializable. >>>>>>> >>>>>>> If you could re-phrase your question and make it clearer, people >>>>>>> here would be able to help you better. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> i am new to Storm and Kafka and working on POC. >>>>>>>> >>>>>>>> my requirement is get a message from Kafka in json format, spout >>>>>>>> reading that message and firts bolt converting the json message to >>>>>>>> different format like csv and the second bolt saving it to hadoop. >>>>>>>> >>>>>>>> now i came up with initial design where i can use kafkaspout to >>>>>>>> read kafka topics and bolt converting it to csv file and next bolt >>>>>>>> saving >>>>>>>> in hadoop. >>>>>>>> >>>>>>>> i have following questions >>>>>>>> can the first bold which coverts the message to csv file can omit >>>>>>>> it? the file would be saving on disk. can a file which is saved on >>>>>>>> disk can >>>>>>>> be omitted. >>>>>>>> how does the second bolt read the file which is saved on disk by >>>>>>>> first bolt? >>>>>>>> do we need to serialize message ommitted by spout and/or bolt? >>>>>>>> >>>>>>>> sorry if the questions sound silly, this is my first topology with >>>>>>>> minimum knowledge of storm. >>>>>>>> >>>>>>>> if you guys think of proper design how to implement the my >>>>>>>> requirement can you please let me know >>>>>>>> >>>>>>>> thanks in advance >>>>>>>> >>>>>>>> -Praveen >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Navin >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Navin >>>> >>> >
