Hi,
From my testing of Spark Streaming with Flume, it seems that there's
only one of the Spark worker nodes that runs a Flume Avro RPC server to
receive messages at any given time, as opposed to every Spark worker
running an Avro RPC server to receive messages. Is this the case? Our
use-case
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.
https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
mfe
On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
christo...@christophe.ccwrote:
Hi,
From my testing of Spark Streaming
I don't see why not. If one were doing something similar with straight
Flume, you'd start an agent on each node you care to receive Avro/RPC
events. In the absence of clearer insight to your use case, I'm puzzling
just a little why it's necessary for each Worker to be its own receiver,
but there's
Cool. I'll look at making the code change in FlumeUtils and generating a
pull request.
As far as the use case, the volume of messages we have is currently about
30 MB per second which may grow to over what a 1 Gbit network adapter can
handle.
- Christophe
On Apr 7, 2014 1:51 PM, Michael Ernest