This could happen if you have corrupted input that always causes a batch to fail and be retried.
I have seen this behaviour before and I didn't see corrupted input. It might be a bug in trident, I'm not sure. If you figure it out please update this thread and/or submit a patch. On Mon, Mar 31, 2014 at 7:39 AM, Danijel Schiavuzzi <[email protected]>wrote: > To (partially) answer my own question -- I still have no idea on the cause > of the stuck topology, but re-submitting the topology helps -- after > re-submitting my topology is now running normally. > > > On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi < > [email protected]> wrote: > >> Also, I did have multiple cases of my IBackingMap workers dying (because >> of RuntimeExceptions) but successfully restarting afterwards (I throw >> RuntimeExceptions in the BackingMap implementation as my strategy in rare >> SQL database deadlock situations to force a worker restart and to >> fail+retry the batch). >> >> From the logs, one such IBackingMap worker death (and subsequent restart) >> resulted in the Kafka spout re-emitting the pending tuple: >> >> 2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] re-emitting >> batch, attempt 29698959:736 >> >> This is of course the normal behavior of a transactional topology, but >> this is the first time I've encountered a case of a batch retrying >> indefinitely. This is especially suspicious since the topology has been >> running fine for 20 days straight, re-emitting batches and restarting >> IBackingMap workers quite a number of times. >> >> I can see in my IBackingMap backing SQL database that the batch with the >> exact txid value 29698959 has been committed -- but I suspect that could >> come from another BackingMap, since there are two BackingMap instances >> running (paralellismHint 2). >> >> However, I have no idea why the batch is being retried indefinitely now >> nor why it hasn't been successfully acked by Trident. >> >> Any suggestions on the area (topology component) to focus my research on? >> >> Thanks, >> >> On Wed, Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi < >> [email protected]> wrote: >> >>> Hello, >>> >>> I'm having problems with my transactional Trident topology. It has been >>> running fine for about 20 days, and suddenly is stuck processing a single >>> batch, with no tuples being emitted nor tuples being persisted by the >>> TridentState (IBackingMap). >>> >>> It's a simple topology which consumes messages off a Kafka queue. The >>> spout is an instance of storm-kafka-0.8-plus TransactionalTridentKafkaSpout >>> and I use the trident-mssql transactional TridentState implementation to >>> persistentAggregate() data into a SQL database. >>> >>> In Zookeeper I can see Storm is re-trying a batch, i.e. >>> >>> "/transactional/<myTopologyName>/coordinator/currattempts" is >>> "{"29698959":6487}" >>> >>> ... and the attempt count keeps increasing. It seems the batch with txid >>> 29698959 is stuck, as the attempt count in Zookeeper keeps increasing -- >>> seems like the batch isn't being acked by Trident and I have no idea why, >>> especially since the topology has been running successfully the last 20 >>> days. >>> >>> I did rebalance the topology on one occasion, after which it continued >>> running normally. Other than that, no other modifications were done. Storm >>> is at version 0.9.0.1. >>> >>> Any hints on how to debug the stuck topology? Any other useful info I >>> might provide? >>> >>> Thanks, >>> >>> -- >>> Danijel Schiavuzzi >>> >>> E: [email protected] >>> W: www.schiavuzzi.com >>> T: +385989035562 >>> Skype: danijel.schiavuzzi >>> >> >> >> >> -- >> Danijel Schiavuzzi >> >> E: [email protected] >> W: www.schiavuzzi.com >> T: +385989035562 >> Skype: danijel.schiavuzzi >> > > > > -- > Danijel Schiavuzzi > > E: [email protected] > W: www.schiavuzzi.com > T: +385989035562 > Skype: danijels7 >
