Hi,
You can follow the link for instructions on using HDFS bolt.

http://storm.apache.org/releases/1.0.0/storm-hdfs.html

Thanks,
Satish.


On Tue, Jul 5, 2016 at 3:02 AM, praveen reddy <[email protected]>
wrote:

> thanks for response, can you please help me on how can i emit csv data
> using bolt. i was able to read json data from Kafka, convert the data into
> java object. i created a utility class to convert java object into csv
> file. now i want to write that csv file (which i stored on disk) onto hdfs
> using bolt. any link to documentation on how to do it would be helpful. i
> did search in google but couldn't find relevant info.
>
> On Mon, Jul 4, 2016 at 5:22 PM, Harsha Chintalapani <[email protected]>
> wrote:
>
>>
>> “Bolts can emit data even without having to write to disk (I think
>> there’s a 2MB limit to the size of that data that can be emitted, because
>> Thrift can’t handle more than that)."
>> There is no such limit. Between workers storm uses netty channels and
>> internal JVM component communication happens through distuptor queue.
>> If one needs to increase the size of buffers for netty take a look at
>> netty configs in storm.yaml. We recommend to go with the defaults.
>> Thanks,
>> Harsha
>>
>> On Mon, Jul 4, 2016 at 9:59 AM Nathan Leung <[email protected]> wrote:
>>
>>> Double check how you are pushing data into Kafka. You are probably
>>> pushing one line at a time.
>>> On Jul 4, 2016 12:30 PM, "Navin Ipe" <[email protected]>
>>> wrote:
>>>
>>>> I haven't worked with Kafka, *so perhaps someone else here would be
>>>> able to help you with it. *
>>>> What I could suggest though, is to search for how to emit more than one
>>>> sentence using the Kafka spout.
>>>>
>>>> If you still can emit only one sentence, then I'd recommend not using a
>>>> separate SaveBolt. Instead, use FieldsGrouping where you group tuples based
>>>> on the name of the CSV file, and emit sentences to TransformBolt. When
>>>> TransformBolt completes receiving all tuples from a CSV, it can save to
>>>> HDFS.
>>>>
>>>> If you still want to use a separate TransformBolt and SaveBolt, then
>>>> use fields grouping as I mentioned above when emitting to both bolts. This
>>>> way, you can have multiple spouts which read from multiple files, and
>>>> whatever they emit will go only to specific bolts.
>>>>
>>>>
>>>> On Mon, Jul 4, 2016 at 9:21 PM, praveen reddy <[email protected]>
>>>> wrote:
>>>>
>>>>> want to add bit more,
>>>>> i am posting the json data using kafka-console-produer.sh file, copy
>>>>> the json data and pasting on console.
>>>>>
>>>>> On Mon, Jul 4, 2016 at 11:44 AM, praveen reddy <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Thanks Naveen for response, i was using mobile so couldn't see
>>>>>> typo's. here is my requirement. this is my first POC on Kafka/Storm, so
>>>>>> please help me if i can design it better way.
>>>>>>
>>>>>> i need to read a Json data from Kafka, than convert the Json Data to
>>>>>> CSV file and save it on HDFS.
>>>>>>
>>>>>> this is how i did initial design and having lot of issues.
>>>>>>
>>>>>>         builder.setSpout("kafka-spout", new
>>>>>> KafkaSpout(kafkaSpoutConfig));
>>>>>>         builder.setBolt("TransformBolt", new
>>>>>> TransformationBolt()).shuffleGrouping("kafka-spout");
>>>>>>         builder.setBolt("Savebolt", new
>>>>>> SaveBolt()).shuffleGrouping("TransformBolt");
>>>>>>
>>>>>> KafkaSpout to read the data from Kafka topic, TransformationBolt to
>>>>>> convert the json to cvs file and savebolt is to save the csv file.
>>>>>>
>>>>>> KafkaSpout was able to read data from Kafka Topic. what i was
>>>>>> expecting from Spout was to get the complete Json data but i am getting 1
>>>>>> line each from Json data i sent to topic
>>>>>>
>>>>>> here is my transport bolt
>>>>>>     @Override
>>>>>>     public void execute(Tuple input) {
>>>>>>         String sentence = input.getString(0);
>>>>>>         collector.emit(new Values(sentence));
>>>>>>         System.out.println("emitted " + sentence);
>>>>>>     }
>>>>>>
>>>>>> i was expecting getString(0) would return complete json data, but
>>>>>> getting only 1 line at once.
>>>>>>
>>>>>> and i am not sure how to emit csv file so that Savebolt would save it.
>>>>>>
>>>>>> can you please let me know how to get complete Json data in single
>>>>>> request rather than line by line, how to emit CSV file from bolt. and if
>>>>>> you guys can help me to design this better it would be really helpful
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Dear Praveen,
>>>>>>>
>>>>>>> The questions aren't silly, but it is rather tough to understand
>>>>>>> what you are trying to convey. When you say "omit", do you mean "emit"?
>>>>>>> Bolts can emit data even without having to write to disk (I think
>>>>>>> there's a 2MB limit to the size of that data that can be emitted, 
>>>>>>> because
>>>>>>> Thrift can't handle more than that).
>>>>>>> If you want one bolt to write to disk and then want another bolt to
>>>>>>> read from disk, then that's also possible.
>>>>>>> The first bolt can just send to the second bolt, whatever
>>>>>>> information is necessary to read from file.
>>>>>>> As of what I know, basic datatypes will automatically get
>>>>>>> serialized. If you have a more complex class, then serialize it with
>>>>>>> Serializable.
>>>>>>>
>>>>>>> If you could re-phrase your question and make it clearer, people
>>>>>>> here would be able to help you better.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> i am new to Storm and Kafka and working on POC.
>>>>>>>>
>>>>>>>> my requirement is get a message from Kafka in json format, spout
>>>>>>>> reading that message and firts bolt converting the json message to
>>>>>>>> different format like csv and the second bolt saving it to hadoop.
>>>>>>>>
>>>>>>>> now i came up with initial design where i can use kafkaspout to
>>>>>>>> read kafka topics and bolt converting it to csv file and next bolt 
>>>>>>>> saving
>>>>>>>> in hadoop.
>>>>>>>>
>>>>>>>> i have following questions
>>>>>>>> can the first bold which coverts the message to csv file can omit
>>>>>>>> it? the file would be saving on disk. can a file which is saved on 
>>>>>>>> disk can
>>>>>>>> be omitted.
>>>>>>>> how does the second bolt read the file which is saved on disk by
>>>>>>>> first bolt?
>>>>>>>> do we need to serialize message ommitted by spout and/or bolt?
>>>>>>>>
>>>>>>>> sorry if the questions sound silly, this is my first topology with
>>>>>>>> minimum knowledge of storm.
>>>>>>>>
>>>>>>>> if you guys think of proper design how to implement the my
>>>>>>>> requirement can you please let me know
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> -Praveen
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Navin
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Navin
>>>>
>>>
>

Reply via email to