I have a multilang spout in php running against a storm 0.9.3 cluster. The
spout does it's initialization just fine, and nextTuple() is called just
fine on the standard 1ms cadence. Everything has been working well, until a
new use case became apparent.

Within nextTuple(), I have a for-loop which emits M 3-tuples of three
integer values, to a well-defined stream in the topology. If M is small or
moderately large (say M ~ 5000), all is fine, nextTuple() completes and is
recalled in turn.

However, if M >~ 7000, the spout emits ~6800 tuples; all of those tuples
are acked by the downstream bolts (single bolt, high parallelism,
shuffled), *but* the spout never acknowledges the ack. It's as if
everything is hung, and nextTuple() is never able to be called again.
Furthermore, the remaining ~200 tuples in the loop are never emitted
downstream. The logger at the very end of the nextTuple() method is never
invoked. This behavior appears to be independent of max spout pending set
size (from 10 to 10000 attempted, for varying sizes of M, still M > 7000
fails).

Modified representation of Use Case:

// override
protected function nextTuple() {
$this->logger->info("nextTuple(): called");
$M; $s; $c; $d; // defined elsewhere, but well-formed
foreach ($M AS $a => $b) {
$messageId = (string)UUID::generate(); // type4 random uuid
$tuple = array((int)$c, (int)$d, (int)$b);
$this->emit($tuple, $messageId, $s);
}
$this->logger->info("nextTuple(): finished");
}

If |$M| == 5000, 'finished' is outputted. If |$M| >= 7000, 'finished' is
never outputted.  Could this be a Tx/Rx buffer setting / some other
configuration parameter as it relates to emit()?

Any insight is appreciated.

Thanks, -Dan

Reply via email to