Hi,
I would like to discuss possible Storm design patterns for the following
requirement:
Given a storm topology that is used in production for (automatic) real-time
stream processing, a REST API is required by the user interface to
interactively (manually) run a subset of the topology and display interim
results.
For simplicity let's assume the following topology:
QueueSpout -> (multiple parallel) ProcessingBolt(s) -> Join -> ReduceBolt ->
PersistenceBolt
The user interface requires each of the ProcessingBolts to be exposed as a
separate REST API.
Design 1:
Deploy a separate DRPCTopology for each ProcessingBolt.
REST server acts as a reverse proxy that forwards the requests to the DRPC
server.
Design 2:
REST server puts message in a priority queue with low priority, and
subscribes for result in Redis.
Use OOP to enhance all processing bolts to be aware of toggles in the
tuple. Effectively the tupple contains toggles, to disable all Processing bolts
but one.
Another toggle forwards interim results to a (Redis) Publish Bolt instead
of the ReduceBolt.
Design 1 Pros:
1. Follows the principle of immutable stream processing graph.
2. Follows the principle of preferring N simpler systems over 1 complex
system.
Design 2 Pros:
1. Makes operations life easier. One system to monitor/upgrade.
2. Enabler for fine-grained monitoring probe to continuously monitor the
real-time system (one subsystem at a time)
3. Enabler for customer specific stream processing (instead of topology per
tenant).
Thoughts?
Itai