Oh ok. Thanks Chi!
Do you have any ideas about why my batch size never seems to get any bigger
than 83K tuples ?
Currently I'm just using a barebones topology that looks like this:
Stream spout = topology.newStream("...", ...)
.parallelismHint()
.groupBy("new Fields("time"))
.aggregate(ne
Raphael,
The number of partitions is defined in your Kafka configuration -
http://kafka.apache.org/documentation.html#brokerconfigs (num.partitions) -
or when you create the topic. The behavior is different for each version
of Kafka, so you should read more documentation. Your topology needs to
m
Thanks for the tips Chi,
I'm a little confused about the partitioning. I had thought that the number
of partitions was determined by the amount of parallelism in the topology.
For example if I said .parallelismHint(4), then I would have 4 different
partitions. Is this not the case ?
Is there a set
Raphael,
You can try tuning your parallelism (and num workers).
For Kafka 0.7, your spout parallelism could max out at: # brokers x #
partitions (for the topic). If you have 4 Kafka brokers, and your topic
has 5 partitions, then you could set the spout parallelism to 20 to
maximize the throughput
I am in the process of optimizing my stream. Currently I expect 5 000 000
tuples to come out of my spout per minute. I am trying to beef up my
topology in order to process this in real time without falling behind.
For some reason my batch size is capping out at 83 thousand tuples. I can't
seem to