It's much more efficient to deserialize it once then pass around POJOs.
JSON serialization is slow compared to Kryo. Our topologies tend to take in
JSON, then emit JSON to external systems at later phases, but all
intermediate stages are POJOs.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
[email protected]


On Mon, Mar 31, 2014 at 11:18 AM, Cody A. Ray <[email protected]> wrote:

> We have one spout which always emits JSON of the same form (this is a
> metric represented as a JSON object with timestamp, name, and value
> fields). Note this is the OpaqueTridentKafkaSpout from storm-kafka-0.8-plus
> which just emits a "bytes" array field, so we first convert it to a string.
>
>     Stream stream = topology.newStream("spout1", spout)
>         .each(new Fields("bytes"), new *BinaryToString*(), new Fields(
> "string"))
>         .each(new Fields("string"), new *MetricJsonParser*(), new Fields(
> "timestamp", "metric", "value"))
>         ...
>
> Each of these two functions are pretty simple:
> https://gist.github.com/codyaray/9897217. I'm using Codehaus' 
> Jettison<http://jettison.codehaus.org/>to parse the JSON.
>
> -Cody
>
>
> On Sat, Mar 29, 2014 at 5:30 PM, Tyson Norris <[email protected]> wrote:
>
>> We have a similar setup, and based on routing needs we plan to pass the
>> original JSON, plus some extra fields that will simplify routing via
>> fieldsGrouping.
>> e.g.
>> spout -> content (JSONObject)
>> bolt1 -> receives all spout tuples; execute() uses JSONObject ->
>> emit(JSONObject, String, String) (String values are parsed out of
>> JSONObject)
>> bolt2 -> receives bolt1 tuples based on fieldsGrouping; execute uses
>> JSONObject, String, String to perform some operation -> emit(JSONObject,
>> String) (String values are based on some logic)
>>
>> So while you could go either extreme (continually pass JSONObject value
>> as tuple, or parse the JSONObject and pass only decomposed values as
>> tuple), you can also do both, which would allow you to use fieldGrouping,
>> in case that is important (it is important for our case).
>>
>> Tyson
>>
>>
>> On Mar 29, 2014, at 1:44 PM, Software Dev <[email protected]>
>> wrote:
>>
>> > We actually have a spout that emits just 1 JSON string per tuple.
>> > Wondering what should be down downstream after we have the JSON string
>> >
>> > On Sat, Mar 29, 2014 at 12:20 PM, Andrew Neilson <[email protected]>
>> wrote:
>> >> my team's project has successfully used Jackson
>> >> (https://github.com/FasterXML/jackson) to deserialize a spout of JSON
>> arrays
>> >> into tuples, and I can recommend it. Though I'll warn you that it
>> takes a
>> >> little bit of work beyond the most basic usage (i.e.
>> mapper.readValue(json,
>> >> List.class)) to avoid dealing with type ambiguity.
>> >>
>> >>
>> >> On Sat, Mar 29, 2014 at 12:06 PM, Software Dev <
>> [email protected]>
>> >> wrote:
>> >>>
>> >>> Say we are receiving tuples of JSON from a spout. Should we just keep
>> >>> passing around the JSON string and deserialize it in each bolt or Is
>> >>> it best to break apart the JSON object into a bunch of fields that can
>> >>> be passed around.
>> >>>
>> >>> I'm thinking in terms of performance the latter may be "better"
>> >>> although it will slightly make the rest of the topology more complex.
>> >>>
>> >>> Also, what is a good JSON library to work with?
>> >>>
>> >>> Thanks
>> >>
>> >>
>>
>>
>
>
> --
> Cody A. Ray, LEED AP
> [email protected]
> 215.501.7891
>

Reply via email to