Trident throughput very low on job with two repartitions

Robert Metzger Fri, 31 Jul 2015 03:22:19 -0700

Hi,

I'm currently trying out Storm with the Spout/Bolt API and Trident.


I have a 40 nodes cluster on google compute. Each machine has 15gb of
memory, 4 cores and 1 TB of disk.

My test job looks like this:

Source --> bolt.fieldsGrouping --> bolt.fieldsGrouping --> Sink


The Source is generating very simple tuples. They contain an id (long), a
timestamp, the hostID and some byte[] payload.

I'm running the Storm job with a parallelism of 120 on 30 machines.

For Storm, I'm getting a throughput of 30.000 elements/second/core.
Storm with fault-tolerance on is achieving 2.500 elements/second/core. I've
set setMaxSpoutPending to 1000, because the elements would time out
otherwise.

For Trident I'm getting only 500 elements/second/core. I've set the batch
size to 5000.

Can you give me advise on improving these numbers? In particular for
trident I think there is something wrong.

You can find the source of the Storm job here:
https://gist.github.com/rmetzger/5d0b1d9553dd8f73ef8c
And Trident: https://gist.github.com/rmetzger/0ad20251d60a73426ebe

Thanks,
Robert

Trident throughput very low on job with two repartitions

Reply via email to