How does one distribute database iteration across workers?

Navin Ipe Mon, 18 Apr 2016 21:34:06 -0700

I've seen this:
http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html
but it doesn't explain how workers coordinate with each other, so
requesting a bit of clarity.


I'm considering a situation where I have 2 million rows in MySQL or MongoDB.

1. I want to use a Spout to read the first 1000 rows and send the processed
output to a Bolt. This happens in Worker1.
2. I want a different instance of the same Spout class to read the next
1000 rows in parallel with the working of the Spout of 1, then send the
processed output to an instance of the same Bolt used in 1. This happens in
Worker2.
3. Same as 1 and 2, but it happens in Worker 3.
4. I might setup 10 workers like this.
5. When all the Bolts in the workers are finished, they send their outputs
to a single Bolt in Worker 11.
6. The Bolt in Worker 11 writes the processed value to a new MySQL table.

*My confusion here is in how to make the database iterations happen batch
by batch, parallelly*. Obviously the database connection would have to be
made in some static class outside the workers, but if workers are started
with just "conf.setNumWorkers(2);", then how do I tell the workers to
iterate different rows of the database? Assuming that the workers are
running in different machines.

-- 
Regards,
Navin

How does one distribute database iteration across workers?

Reply via email to