We have one spout which always emits JSON of the same form (this is a
metric represented as a JSON object with timestamp, name, and value
fields). Note this is the OpaqueTridentKafkaSpout from storm-kafka-0.8-plus
which just emits a "bytes" array field, so we first convert it to a string.

    Stream stream = topology.newStream("spout1", spout)
        .each(new Fields("bytes"), new *BinaryToString*(), new Fields(
"string"))
        .each(new Fields("string"), new *MetricJsonParser*(), new Fields(
"timestamp", "metric", "value"))
        ...

Each of these two functions are pretty simple:
https://gist.github.com/codyaray/9897217. I'm using Codehaus'
Jettison<http://jettison.codehaus.org/>to parse the JSON.

-Cody


On Sat, Mar 29, 2014 at 5:30 PM, Tyson Norris <[email protected]> wrote:

> We have a similar setup, and based on routing needs we plan to pass the
> original JSON, plus some extra fields that will simplify routing via
> fieldsGrouping.
> e.g.
> spout -> content (JSONObject)
> bolt1 -> receives all spout tuples; execute() uses JSONObject ->
> emit(JSONObject, String, String) (String values are parsed out of
> JSONObject)
> bolt2 -> receives bolt1 tuples based on fieldsGrouping; execute uses
> JSONObject, String, String to perform some operation -> emit(JSONObject,
> String) (String values are based on some logic)
>
> So while you could go either extreme (continually pass JSONObject value as
> tuple, or parse the JSONObject and pass only decomposed values as tuple),
> you can also do both, which would allow you to use fieldGrouping, in case
> that is important (it is important for our case).
>
> Tyson
>
>
> On Mar 29, 2014, at 1:44 PM, Software Dev <[email protected]>
> wrote:
>
> > We actually have a spout that emits just 1 JSON string per tuple.
> > Wondering what should be down downstream after we have the JSON string
> >
> > On Sat, Mar 29, 2014 at 12:20 PM, Andrew Neilson <[email protected]>
> wrote:
> >> my team's project has successfully used Jackson
> >> (https://github.com/FasterXML/jackson) to deserialize a spout of JSON
> arrays
> >> into tuples, and I can recommend it. Though I'll warn you that it takes
> a
> >> little bit of work beyond the most basic usage (i.e.
> mapper.readValue(json,
> >> List.class)) to avoid dealing with type ambiguity.
> >>
> >>
> >> On Sat, Mar 29, 2014 at 12:06 PM, Software Dev <
> [email protected]>
> >> wrote:
> >>>
> >>> Say we are receiving tuples of JSON from a spout. Should we just keep
> >>> passing around the JSON string and deserialize it in each bolt or Is
> >>> it best to break apart the JSON object into a bunch of fields that can
> >>> be passed around.
> >>>
> >>> I'm thinking in terms of performance the latter may be "better"
> >>> although it will slightly make the rest of the topology more complex.
> >>>
> >>> Also, what is a good JSON library to work with?
> >>>
> >>> Thanks
> >>
> >>
>
>


-- 
Cody A. Ray, LEED AP
[email protected]
215.501.7891

Reply via email to