Hi Jason, Could you be more specific -- what do you mean by "corrupted input"? Do you mean that there's a bug in Trident itself that causes the tuples in a batch to somehow become corrupted?
Thanks a lot! Danijel On Monday, April 7, 2014, Jason Jackson <[email protected]> wrote: > This could happen if you have corrupted input that always causes a batch > to fail and be retried. > > I have seen this behaviour before and I didn't see corrupted input. It > might be a bug in trident, I'm not sure. If you figure it out please update > this thread and/or submit a patch. > > > > On Mon, Mar 31, 2014 at 7:39 AM, Danijel Schiavuzzi < > [email protected]> wrote: > > To (partially) answer my own question -- I still have no idea on the cause > of the stuck topology, but re-submitting the topology helps -- after > re-submitting my topology is now running normally. > > > On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi < > [email protected]> wrote: > > Also, I did have multiple cases of my IBackingMap workers dying (because > of RuntimeExceptions) but successfully restarting afterwards (I throw > RuntimeExceptions in the BackingMap implementation as my strategy in rare > SQL database deadlock situations to force a worker restart and to > fail+retry the batch). > > From the logs, one such IBackingMap worker death (and subsequent restart) > resulted in the Kafka spout re-emitting the pending tuple: > > 2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] re-emitting > batch, attempt 29698959:736 > > This is of course the normal behavior of a transactional topology, but > this is the first time I've encountered a case of a batch retrying > indefinitely. This is especially suspicious since the topology has been > running fine for 20 days straight, re-emitting batches and restarting > IBackingMap workers quite a number of times. > > I can see in my IBackingMap backing SQL database that the batch with the > exact txid value 29698959 has been committed -- but I suspect that could > come from another BackingMap, since there are two BackingMap instances > running (paralellismHint 2). > > However, I have no idea why the batch is being retried indefinitely now > nor why it hasn't been successfully acked by Trident. > > Any suggestions on the area (topology component) to focus my research on? > > Thanks, > > On Wed, Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi < > [email protected]> wrote: > > Hello, > > I'm having problems with my transactional Trident topology. It has been > running fine for about 20 days, and suddenly is stuck processing a single > batch, with no tuples being emitted nor tuples being persisted by the > TridentState (IBackingMap). > > It's a simple topology which consumes messages off a Kafka queue. The > spout is an instance of storm-kafka-0.8-plus TransactionalTridentKafkaSpout > and I use the trident-mssql transactional TridentState implementation to > persistentAggregate() data into a SQL database. > > In Zookeeper I can see Storm is re-trying a batch, i.e. > > "/transactional/<myTopologyName>/coordinator/currattempts" is > "{"29698959":6487}" > > ... and the attempt count keeps increasing. It seems the batch with txid > 29698959 is stuck, as the attempt count in Zookeeper keeps increasing -- > seems like the batch isn't being acked by Trident and I have no idea why, > especially since the topology has been running successfully the last 20 > days. > > I did rebalance the topology on one occasion, after which it continued > running normally. Other than that, no other modifications were done. Storm > is at version 0.9.0.1. > > Any hints on how to debug the stuck topology? Any other useful info I > might provide? > > Thanks, > > -- > Danijel Schiavuzzi > > E: [email protected] > W: www.schiavuzzi.com > T: +385989035562 > Skype: danijel.schiavuzzi > > > > > -- > Danijel Schiavuzzi > > E: danije > > -- Danijel Schiavuzzi E: [email protected] W: www.schiavuzzi.com T: +385989035562 Skype: danijels7
