thanks for response, can you please help me on how can i emit csv data using bolt. i was able to read json data from Kafka, convert the data into java object. i created a utility class to convert java object into csv file. now i want to write that csv file (which i stored on disk) onto hdfs using bolt. any link to documentation on how to do it would be helpful. i did search in google but couldn't find relevant info.
On Mon, Jul 4, 2016 at 5:22 PM, Harsha Chintalapani <[email protected]> wrote: > > “Bolts can emit data even without having to write to disk (I think > there’s a 2MB limit to the size of that data that can be emitted, because > Thrift can’t handle more than that)." > There is no such limit. Between workers storm uses netty channels and > internal JVM component communication happens through distuptor queue. > If one needs to increase the size of buffers for netty take a look at > netty configs in storm.yaml. We recommend to go with the defaults. > Thanks, > Harsha > > On Mon, Jul 4, 2016 at 9:59 AM Nathan Leung <[email protected]> wrote: > >> Double check how you are pushing data into Kafka. You are probably >> pushing one line at a time. >> On Jul 4, 2016 12:30 PM, "Navin Ipe" <[email protected]> >> wrote: >> >>> I haven't worked with Kafka, *so perhaps someone else here would be >>> able to help you with it. * >>> What I could suggest though, is to search for how to emit more than one >>> sentence using the Kafka spout. >>> >>> If you still can emit only one sentence, then I'd recommend not using a >>> separate SaveBolt. Instead, use FieldsGrouping where you group tuples based >>> on the name of the CSV file, and emit sentences to TransformBolt. When >>> TransformBolt completes receiving all tuples from a CSV, it can save to >>> HDFS. >>> >>> If you still want to use a separate TransformBolt and SaveBolt, then use >>> fields grouping as I mentioned above when emitting to both bolts. This way, >>> you can have multiple spouts which read from multiple files, and whatever >>> they emit will go only to specific bolts. >>> >>> >>> On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <[email protected]> >>> wrote: >>> >>>> want to add bit more, >>>> i am posting the json data using kafka-console-produer.sh file, copy >>>> the json data and pasting on console. >>>> >>>> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <[email protected]> >>>> wrote: >>>> >>>>> Thanks Naveen for response, i was using mobile so couldn't see typo's. >>>>> here is my requirement. this is my first POC on Kafka/Storm, so please >>>>> help >>>>> me if i can design it better way. >>>>> >>>>> i need to read a Json data from Kafka, than convert the Json Data to >>>>> CSV file and save it on HDFS. >>>>> >>>>> this is how i did initial design and having lot of issues. >>>>> >>>>> builder.setSpout("kafka-spout", new >>>>> KafkaSpout(kafkaSpoutConfig)); >>>>> builder.setBolt("TransformBolt", new >>>>> TransformationBolt()).shuffleGrouping("kafka-spout"); >>>>> builder.setBolt("Savebolt", new >>>>> SaveBolt()).shuffleGrouping("TransformBolt"); >>>>> >>>>> KafkaSpout to read the data from Kafka topic, TransformationBolt to >>>>> convert the json to cvs file and savebolt is to save the csv file. >>>>> >>>>> KafkaSpout was able to read data from Kafka Topic. what i was >>>>> expecting from Spout was to get the complete Json data but i am getting 1 >>>>> line each from Json data i sent to topic >>>>> >>>>> here is my transport bolt >>>>> @Override >>>>> public void execute(Tuple input) { >>>>> String sentence = input.getString(0); >>>>> collector.emit(new Values(sentence)); >>>>> System.out.println("emitted " + sentence); >>>>> } >>>>> >>>>> i was expecting getString(0) would return complete json data, but >>>>> getting only 1 line at once. >>>>> >>>>> and i am not sure how to emit csv file so that Savebolt would save it. >>>>> >>>>> can you please let me know how to get complete Json data in single >>>>> request rather than line by line, how to emit CSV file from bolt. and if >>>>> you guys can help me to design this better it would be really helpful >>>>> >>>>> >>>>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe < >>>>> [email protected]> wrote: >>>>> >>>>>> Dear Praveen, >>>>>> >>>>>> The questions aren't silly, but it is rather tough to understand what >>>>>> you are trying to convey. When you say "omit", do you mean "emit"? >>>>>> Bolts can emit data even without having to write to disk (I think >>>>>> there's a 2MB limit to the size of that data that can be emitted, because >>>>>> Thrift can't handle more than that). >>>>>> If you want one bolt to write to disk and then want another bolt to >>>>>> read from disk, then that's also possible. >>>>>> The first bolt can just send to the second bolt, whatever information >>>>>> is necessary to read from file. >>>>>> As of what I know, basic datatypes will automatically get serialized. >>>>>> If you have a more complex class, then serialize it with Serializable. >>>>>> >>>>>> If you could re-phrase your question and make it clearer, people here >>>>>> would be able to help you better. >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> i am new to Storm and Kafka and working on POC. >>>>>>> >>>>>>> my requirement is get a message from Kafka in json format, spout >>>>>>> reading that message and firts bolt converting the json message to >>>>>>> different format like csv and the second bolt saving it to hadoop. >>>>>>> >>>>>>> now i came up with initial design where i can use kafkaspout to read >>>>>>> kafka topics and bolt converting it to csv file and next bolt saving in >>>>>>> hadoop. >>>>>>> >>>>>>> i have following questions >>>>>>> can the first bold which coverts the message to csv file can omit >>>>>>> it? the file would be saving on disk. can a file which is saved on disk >>>>>>> can >>>>>>> be omitted. >>>>>>> how does the second bolt read the file which is saved on disk by >>>>>>> first bolt? >>>>>>> do we need to serialize message ommitted by spout and/or bolt? >>>>>>> >>>>>>> sorry if the questions sound silly, this is my first topology with >>>>>>> minimum knowledge of storm. >>>>>>> >>>>>>> if you guys think of proper design how to implement the my >>>>>>> requirement can you please let me know >>>>>>> >>>>>>> thanks in advance >>>>>>> >>>>>>> -Praveen >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Navin >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Regards, >>> Navin >>> >>
