thanks for response, can you please help me on how can i emit csv data
using bolt. i was able to read json data from Kafka, convert the data into
java object. i created a utility class to convert java object into csv
file. now i want to write that csv file (which i stored on disk) onto hdfs
using bolt. any link to documentation on how to do it would be helpful. i
did search in google but couldn't find relevant info.

On Mon, Jul 4, 2016 at 5:22 PM, Harsha Chintalapani <[email protected]> wrote:

>
> “Bolts can emit data even without having to write to disk (I think
> there’s a 2MB limit to the size of that data that can be emitted, because
> Thrift can’t handle more than that)."
> There is no such limit. Between workers storm uses netty channels and
> internal JVM component communication happens through distuptor queue.
> If one needs to increase the size of buffers for netty take a look at
> netty configs in storm.yaml. We recommend to go with the defaults.
> Thanks,
> Harsha
>
> On Mon, Jul 4, 2016 at 9:59 AM Nathan Leung <[email protected]> wrote:
>
>> Double check how you are pushing data into Kafka. You are probably
>> pushing one line at a time.
>> On Jul 4, 2016 12:30 PM, "Navin Ipe" <[email protected]>
>> wrote:
>>
>>> I haven't worked with Kafka, *so perhaps someone else here would be
>>> able to help you with it. *
>>> What I could suggest though, is to search for how to emit more than one
>>> sentence using the Kafka spout.
>>>
>>> If you still can emit only one sentence, then I'd recommend not using a
>>> separate SaveBolt. Instead, use FieldsGrouping where you group tuples based
>>> on the name of the CSV file, and emit sentences to TransformBolt. When
>>> TransformBolt completes receiving all tuples from a CSV, it can save to
>>> HDFS.
>>>
>>> If you still want to use a separate TransformBolt and SaveBolt, then use
>>> fields grouping as I mentioned above when emitting to both bolts. This way,
>>> you can have multiple spouts which read from multiple files, and whatever
>>> they emit will go only to specific bolts.
>>>
>>>
>>> On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <[email protected]>
>>> wrote:
>>>
>>>> want to add bit more,
>>>> i am posting the json data using kafka-console-produer.sh file, copy
>>>> the json data and pasting on console.
>>>>
>>>> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Naveen for response, i was using mobile so couldn't see typo's.
>>>>> here is my requirement. this is my first POC on Kafka/Storm, so please 
>>>>> help
>>>>> me if i can design it better way.
>>>>>
>>>>> i need to read a Json data from Kafka, than convert the Json Data to
>>>>> CSV file and save it on HDFS.
>>>>>
>>>>> this is how i did initial design and having lot of issues.
>>>>>
>>>>>         builder.setSpout("kafka-spout", new
>>>>> KafkaSpout(kafkaSpoutConfig));
>>>>>         builder.setBolt("TransformBolt", new
>>>>> TransformationBolt()).shuffleGrouping("kafka-spout");
>>>>>         builder.setBolt("Savebolt", new
>>>>> SaveBolt()).shuffleGrouping("TransformBolt");
>>>>>
>>>>> KafkaSpout to read the data from Kafka topic, TransformationBolt to
>>>>> convert the json to cvs file and savebolt is to save the csv file.
>>>>>
>>>>> KafkaSpout was able to read data from Kafka Topic. what i was
>>>>> expecting from Spout was to get the complete Json data but i am getting 1
>>>>> line each from Json data i sent to topic
>>>>>
>>>>> here is my transport bolt
>>>>>     @Override
>>>>>     public void execute(Tuple input) {
>>>>>         String sentence = input.getString(0);
>>>>>         collector.emit(new Values(sentence));
>>>>>         System.out.println("emitted " + sentence);
>>>>>     }
>>>>>
>>>>> i was expecting getString(0) would return complete json data, but
>>>>> getting only 1 line at once.
>>>>>
>>>>> and i am not sure how to emit csv file so that Savebolt would save it.
>>>>>
>>>>> can you please let me know how to get complete Json data in single
>>>>> request rather than line by line, how to emit CSV file from bolt. and if
>>>>> you guys can help me to design this better it would be really helpful
>>>>>
>>>>>
>>>>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Dear Praveen,
>>>>>>
>>>>>> The questions aren't silly, but it is rather tough to understand what
>>>>>> you are trying to convey. When you say "omit", do you mean "emit"?
>>>>>> Bolts can emit data even without having to write to disk (I think
>>>>>> there's a 2MB limit to the size of that data that can be emitted, because
>>>>>> Thrift can't handle more than that).
>>>>>> If you want one bolt to write to disk and then want another bolt to
>>>>>> read from disk, then that's also possible.
>>>>>> The first bolt can just send to the second bolt, whatever information
>>>>>> is necessary to read from file.
>>>>>> As of what I know, basic datatypes will automatically get serialized.
>>>>>> If you have a more complex class, then serialize it with Serializable.
>>>>>>
>>>>>> If you could re-phrase your question and make it clearer, people here
>>>>>> would be able to help you better.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> i am new to Storm and Kafka and working on POC.
>>>>>>>
>>>>>>> my requirement is get a message from Kafka in json format, spout
>>>>>>> reading that message and firts bolt converting the json message to
>>>>>>> different format like csv and the second bolt saving it to hadoop.
>>>>>>>
>>>>>>> now i came up with initial design where i can use kafkaspout to read
>>>>>>> kafka topics and bolt converting it to csv file and next bolt saving in
>>>>>>> hadoop.
>>>>>>>
>>>>>>> i have following questions
>>>>>>> can the first bold which coverts the message to csv file can omit
>>>>>>> it? the file would be saving on disk. can a file which is saved on disk 
>>>>>>> can
>>>>>>> be omitted.
>>>>>>> how does the second bolt read the file which is saved on disk by
>>>>>>> first bolt?
>>>>>>> do we need to serialize message ommitted by spout and/or bolt?
>>>>>>>
>>>>>>> sorry if the questions sound silly, this is my first topology with
>>>>>>> minimum knowledge of storm.
>>>>>>>
>>>>>>> if you guys think of proper design how to implement the my
>>>>>>> requirement can you please let me know
>>>>>>>
>>>>>>> thanks in advance
>>>>>>>
>>>>>>> -Praveen
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Navin
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>

Reply via email to