Hi,

as long as you don't saturate the network you should more or less increase 
linearly the overall throughput by adding processing nodes.

There are quite a few parameters that may influence the throughput:
- input load skewed
- related to the above: small number of keys (S4 handles thousands or millions 
of keys. If you just have a few, load probably won't be balanced)
- in a shared infrastructure like AWS: network traffic from others

About load shedding, I don't see how you'd send more data with that than with a 
blocking sender: with that sender, anything that overflows the communication 
channel is just dropped. Are you sure you are measuring the actual data output 
and not just what you pass to the sender?

If you consider throughput is low, one possible interpretation is that the 
processing of the events is just costly. You may want to look at the 
"---pe-processing-time" metric that reports the duration of processing a given 
event. 

Note that in maybe simpler configurations (not sure what you are doing), it's 
possible to easily achieve 100 or 200k events/s/stream/node

Hope this helps,

Matthieu




On Dec 19, 2013, at 10:43 , Giacomo Fiorini <giacomo.fior...@studio.unibo.it> 
wrote:

> Hi,
> i was testing my s4 application on a cluster of aws instances and i 
> encountered a problem when trying different configurations with different 
> number of nodes.
> I use one node to produce events reading from an input file, this adapter 
> node sends events to a second cluster that processes them.
> I run a test setting Sender, RemoteSender and StreamExecutor in blocking mode 
> (with a custom module and -emc option when deploying both the adapter and the 
> application)
> Changing the number of processing nodes in the cluster and measuring the 
> events/s sent from the adapter produced these results:
> -1 node: 3500 Events/s
> -2 nodes: 8600 Events/s
> -3 nodes: 9300 Events/s
> The 3 node configuration doesn' t seem to perform well as it should; im sure 
> it is not an adapter problem because i tried to add another adapter node and 
> the total number of events/s sent from both is the same as with one adapter 
> only.
> Plus i've tried setting LoadShedding policy as default sending at a much 
> higher rate (20000 ev/s) and i measured that the Kb/s sent from the adapter 
> are roughly the double of the blocking mode.
> I also tried to increase s4.sender.parallelism and 
> s4.remoteSender.parallelism in the ./deploy command but it hasn't solved my 
> problem.
> Is there something else i should try?
> 
> Regards,
> Giacomo Fiorini

Reply via email to