Can you tell us more about this use case? I don't really understand, but
given what you've said so far, I might create a trident topology something
like this:
TridentTopology topology = new TridentTopology();
StormTopology = topology.newStream("spout1", spout)
.each(new Fields("request_id"), new CsvReader(), new
Fields("csv_field1", "csv_field2", "csv_fieldN"));
.groupBy(new Fields("csv_field1"))
.... do something on the GroupedStream
.build();
public class CsvReader extends BaseFunction {
public CsvReader() {
}
@Override
public void execute(TridentTuple tuple, TridentCollector collector)
{
long requestId = tuple.getLong(0);
// do something with this requestId to figure out which CSV
file to read ???
/* PSEUDOCODE
for (each line in the CSV) {
// emit one tuple per line with all the fields
collector.emit(new Values(line[0], line[1], line[N]));
}
*/
}
}
(Trident makes working with batches a lot easier. :)
In general though, I'm not sure where you're getting the CSV files. I don't
think reading CSV files off of the worker nodes' disks directly would be a
good practice in Storm. It'd probably be better if your spouts emitted the
data themselves or something.
-Cody
On Tue, May 6, 2014 at 1:13 AM, Kiran Kumar <[email protected]>wrote:
> Hi Padma,
>
> Firstly, thanks for responding.
>
> Here is how i am defining my topology conceptually..
>
> - Spout waits for a request signal..
> - once spout got a signal, it generates a request_id and broadcasts that
> request_id to 10 csv reader bolts..
> - 10 csv reader bolts reads csv files line-by-line and emits those tuples,
> respectively..
> - Now (this is the place where i need suggestion in technical/syntactical)
> i need to batch up those tuples from all the 10 csv reader bolts on
> specified fields..
> - finally, batch-ed tuples will be processed by final bolts.
>
> What i need is a technical approach.
> On Tuesday, 6 May 2014 11:10 AM, padma priya chitturi <
> [email protected]> wrote:
>
> Hi,
>
> You can define spouts and bolts in such a way that, input streams read
> by spouts would be grouped on specified fields and these could be processed
> by specific bolts. This way, you could make batches of input stream.
>
>
> On Tue, May 6, 2014 at 11:02 AM, Kiran Kumar <[email protected]>wrote:
>
> Hi,
>
> Can anyone suggest me a topology that makes batches of the input stream
> on specified fields. so that the batch will be forwarded to a function that
> processes it.
>
> Regards,
> Kiran Kumar Dasari.
>
>
>
>
>
--
Cody A. Ray, LEED AP
[email protected]
215.501.7891