To (partially) answer my own question -- I still have no idea on the cause
of the stuck topology, but re-submitting the topology helps -- after
re-submitting my topology is now running normally.


On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi
<[email protected]>wrote:

> Also, I did have multiple cases of my IBackingMap workers dying (because
> of RuntimeExceptions) but successfully restarting afterwards (I throw
> RuntimeExceptions in the BackingMap implementation as my strategy in rare
> SQL database deadlock situations to force a worker restart and to
> fail+retry the batch).
>
> From the logs, one such IBackingMap worker death (and subsequent restart)
> resulted in the Kafka spout re-emitting the pending tuple:
>
>     2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] re-emitting
> batch, attempt 29698959:736
>
> This is of course the normal behavior of a transactional topology, but
> this is the first time I've encountered a case of a batch retrying
> indefinitely. This is especially suspicious since the topology has been
> running fine for 20 days straight, re-emitting batches and restarting
> IBackingMap workers quite a number of times.
>
> I can see in my IBackingMap backing SQL database that the batch with the
> exact txid value 29698959 has been committed -- but I suspect that could
> come from another BackingMap, since there are two BackingMap instances
> running (paralellismHint 2).
>
> However, I have no idea why the batch is being retried indefinitely now
> nor why it hasn't been successfully acked by Trident.
>
> Any suggestions on the area (topology component) to focus my research on?
>
> Thanks,
>
> On Wed, Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi <
> [email protected]> wrote:
>
>> Hello,
>>
>> I'm having problems with my transactional Trident topology. It has been
>> running fine for about 20 days, and suddenly is stuck processing a single
>> batch, with no tuples being emitted nor tuples being persisted by the
>> TridentState (IBackingMap).
>>
>> It's a simple topology which consumes messages off a Kafka queue. The
>> spout is an instance of storm-kafka-0.8-plus TransactionalTridentKafkaSpout
>> and I use the trident-mssql transactional TridentState implementation to
>> persistentAggregate() data into a SQL database.
>>
>> In Zookeeper I can see Storm is re-trying a batch, i.e.
>>
>>      "/transactional/<myTopologyName>/coordinator/currattempts" is
>> "{"29698959":6487}"
>>
>> ... and the attempt count keeps increasing. It seems the batch with txid
>> 29698959 is stuck, as the attempt count in Zookeeper keeps increasing --
>> seems like the batch isn't being acked by Trident and I have no idea why,
>> especially since the topology has been running successfully the last 20
>> days.
>>
>> I did rebalance the topology on one occasion, after which it continued
>> running normally. Other than that, no other modifications were done. Storm
>> is at version 0.9.0.1.
>>
>> Any hints on how to debug the stuck topology? Any other useful info I
>> might provide?
>>
>> Thanks,
>>
>> --
>> Danijel Schiavuzzi
>>
>> E: [email protected]
>> W: www.schiavuzzi.com
>> T: +385989035562
>> Skype: danijel.schiavuzzi
>>
>
>
>
> --
> Danijel Schiavuzzi
>
> E: [email protected]
> W: www.schiavuzzi.com
> T: +385989035562
> Skype: danijel.schiavuzzi
>



-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Reply via email to