Thanks Naveen for response, i was using mobile so couldn't see typo's. here
is my requirement. this is my first POC on Kafka/Storm, so please help me
if i can design it better way.
i need to read a Json data from Kafka, than convert the Json Data to CSV
file and save it on HDFS.
this is how i did initial design and having lot of issues.
builder.setSpout("kafka-spout", new KafkaSpout(kafkaSpoutConfig));
builder.setBolt("TransformBolt", new
TransformationBolt()).shuffleGrouping("kafka-spout");
builder.setBolt("Savebolt", new
SaveBolt()).shuffleGrouping("TransformBolt");
KafkaSpout to read the data from Kafka topic, TransformationBolt to convert
the json to cvs file and savebolt is to save the csv file.
KafkaSpout was able to read data from Kafka Topic. what i was expecting
from Spout was to get the complete Json data but i am getting 1 line each
from Json data i sent to topic
here is my transport bolt
@Override
public void execute(Tuple input) {
String sentence = input.getString(0);
collector.emit(new Values(sentence));
System.out.println("emitted " + sentence);
}
i was expecting getString(0) would return complete json data, but getting
only 1 line at once.
and i am not sure how to emit csv file so that Savebolt would save it.
can you please let me know how to get complete Json data in single request
rather than line by line, how to emit CSV file from bolt. and if you guys
can help me to design this better it would be really helpful
On Mon, Jul 4, 2016 at 5:59 AM, Navin Ipe <[email protected]>
wrote:
> Dear Praveen,
>
> The questions aren't silly, but it is rather tough to understand what you
> are trying to convey. When you say "omit", do you mean "emit"?
> Bolts can emit data even without having to write to disk (I think there's
> a 2MB limit to the size of that data that can be emitted, because Thrift
> can't handle more than that).
> If you want one bolt to write to disk and then want another bolt to read
> from disk, then that's also possible.
> The first bolt can just send to the second bolt, whatever information is
> necessary to read from file.
> As of what I know, basic datatypes will automatically get serialized. If
> you have a more complex class, then serialize it with Serializable.
>
> If you could re-phrase your question and make it clearer, people here
> would be able to help you better.
>
>
>
> On Sat, Jul 2, 2016 at 7:16 AM, praveen reddy <
> [email protected]> wrote:
>
>> Hi All,
>>
>> i am new to Storm and Kafka and working on POC.
>>
>> my requirement is get a message from Kafka in json format, spout reading
>> that message and firts bolt converting the json message to different format
>> like csv and the second bolt saving it to hadoop.
>>
>> now i came up with initial design where i can use kafkaspout to read
>> kafka topics and bolt converting it to csv file and next bolt saving in
>> hadoop.
>>
>> i have following questions
>> can the first bold which coverts the message to csv file can omit it? the
>> file would be saving on disk. can a file which is saved on disk can be
>> omitted.
>> how does the second bolt read the file which is saved on disk by first
>> bolt?
>> do we need to serialize message ommitted by spout and/or bolt?
>>
>> sorry if the questions sound silly, this is my first topology with
>> minimum knowledge of storm.
>>
>> if you guys think of proper design how to implement the my requirement
>> can you please let me know
>>
>> thanks in advance
>>
>> -Praveen
>>
>
>
>
> --
> Regards,
> Navin
>