I have written a custom TridentSpout which reads from HDFS and publishes the 
content as messages to further group by bolts.This spout implements the 
IBatchSpout interface.

While performing the benchmark of my topology i observed that i am getting a 
max throughput of 10K messages from spout although if i execute my HDFS reading 
part in isolation and not as Trident Spout i get a throughput of 100K hence it 
is clear than bottleneck is on the storm side.

I further instrumented the code and figured that a single call of 
> collector.emit(List<Object>)
from the spout is taking 0.1 ms of time and increasing the buffer sizes i.e 
“topology.receiver.buffer.size = 
16”,”topology.transfer.buffer.size=64”,”topology.executor.receive.buffer.size=32768”
 and “topology.executor.send.buffer.size=32768” also didn’t help.

Is it expected for the above mentioned function call to take so much time??

Thanks
Rohit


Reply via email to