Many thanks!
From: [email protected]
Date: Tue, 8 Apr 2014 15:05:48 -0500
Subject: Re: Can Storm write an Aggregate Record to Postgres or SQL Server?
To: [email protected]
Can you elaborate on how you want to "aggregate" data? If each log entry is
essentially a timestamp, a transaction type (since you mentioned this), and a
numerical value (which you cant to sum during the 5-minute window), then you
don't need tick tuples.
The way we do aggregation is by mapping a timestamp into a bucket (e.g., your
5-minute window), grouping by the timestamp and transactionType, and using
trident's persistentAggregate functionality to compute the sum.
Something like this:
TridentTopology topology = new TridentTopology();
Stream stream = topology.newStream("spout2", spout)
.each(new Fields("bytes"), new BinaryToString(), new Fields("string"))
.each(new Fields("string"), new LogParser(), new Fields("timestamp",
"transactionType", "value"));
.each(new Fields("timestamp"), new Bucket(entry.getValue()), new
Fields("bucketStart", "bucketEnd"))
.groupBy(new Fields("bucketStart", "transactionType"))
.persistentAggregate(stateFactory, new Fields("value"), new Sum(), new
Fields("count"))
return topology.build();
Of course, you have to write the LogParser yourself since it depends on the
format of the input messages. This example assumes a Kafka spout which is why
it starts by parsing a "bytes" field. You can see the various helper functions
here: https://gist.github.com/codyaray/9897217.
-Cody
On Tue, Apr 8, 2014 at 2:03 PM, Huang, Roger <[email protected]> wrote:
Neil,
Take a look at using “tick tuples”
http://nathanmarz.github.io/storm/doc/backtype/storm/Config.html#TOPOLOGY_TICK_TUPLE_FREQ_SECS
and the Storm RDBMS bolt
https://github.com/nathanmarz/storm-contrib/tree/master/storm-rdbms
-Roger
From: Neil Carroll [mailto:[email protected]]
Sent: Tuesday, April 08, 2014 1:42 PM
To: [email protected]
Subject: Can Storm write an Aggregate Record to Postgres or SQL Server?
I'm new to Storm and want to use it to aggregate log data over a 5 minute
period and write aggregate records (for each transaction type) into a DCMS (SQL
or Postgres).
I believe Storm can do this - and is there sample code available?
Thanks
Neil
--
Cody A. Ray, LEED AP
[email protected]
215.501.7891