Hello, I am currently running some experiments and in order to send data to my spouts, I do the following:
I spawn external processes which read the data from files (on disk) and they send them through TCP sockets to Spouts. I do the former because (a) I want to control the input rate of the spouts, and (b) so that I can use previously gathered data for my experiments. Unfortunately, when I want to maintain input rates greater than 16 thousands tuples per second, I see that my scheme is not fast enough, and the input rate is capped. Do you think that there is a better way to send (replay) previously gathered data in my topology? Thanks, Nick
