Re: Topology is stuck

Danijel Schiavuzzi Mon, 07 Apr 2014 10:39:01 -0700

Hi Jason,

Could you be more specific -- what do you mean by "corrupted input"? Do you
mean that there's a bug in Trident itself that causes the tuples in a
batch to somehow become corrupted?


Thanks a lot!

Danijel

On Monday, April 7, 2014, Jason Jackson <[email protected]> wrote:

> This could happen if you have corrupted input that always causes a batch
> to fail and be retried.
>
> I have seen this behaviour before and I didn't see corrupted input. It
> might be a bug in trident, I'm not sure. If you figure it out please update
> this thread and/or submit a patch.
>
>
>
> On Mon, Mar 31, 2014 at 7:39 AM, Danijel Schiavuzzi <
> [email protected]> wrote:
>
> To (partially) answer my own question -- I still have no idea on the cause
> of the stuck topology, but re-submitting the topology helps -- after
> re-submitting my topology is now running normally.
>
>
> On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi <
> [email protected]> wrote:
>
> Also, I did have multiple cases of my IBackingMap workers dying (because
> of RuntimeExceptions) but successfully restarting afterwards (I throw
> RuntimeExceptions in the BackingMap implementation as my strategy in rare
> SQL database deadlock situations to force a worker restart and to
> fail+retry the batch).
>
> From the logs, one such IBackingMap worker death (and subsequent restart)
> resulted in the Kafka spout re-emitting the pending tuple:
>
>     2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] re-emitting
> batch, attempt 29698959:736
>
> This is of course the normal behavior of a transactional topology, but
> this is the first time I've encountered a case of a batch retrying
> indefinitely. This is especially suspicious since the topology has been
> running fine for 20 days straight, re-emitting batches and restarting
> IBackingMap workers quite a number of times.
>
> I can see in my IBackingMap backing SQL database that the batch with the
> exact txid value 29698959 has been committed -- but I suspect that could
> come from another BackingMap, since there are two BackingMap instances
> running (paralellismHint 2).
>
> However, I have no idea why the batch is being retried indefinitely now
> nor why it hasn't been successfully acked by Trident.
>
> Any suggestions on the area (topology component) to focus my research on?
>
> Thanks,
>
> On Wed, Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi <
> [email protected]> wrote:
>
> Hello,
>
> I'm having problems with my transactional Trident topology. It has been
> running fine for about 20 days, and suddenly is stuck processing a single
> batch, with no tuples being emitted nor tuples being persisted by the
> TridentState (IBackingMap).
>
> It's a simple topology which consumes messages off a Kafka queue. The
> spout is an instance of storm-kafka-0.8-plus TransactionalTridentKafkaSpout
> and I use the trident-mssql transactional TridentState implementation to
> persistentAggregate() data into a SQL database.
>
> In Zookeeper I can see Storm is re-trying a batch, i.e.
>
>      "/transactional/<myTopologyName>/coordinator/currattempts" is
> "{"29698959":6487}"
>
> ... and the attempt count keeps increasing. It seems the batch with txid
> 29698959 is stuck, as the attempt count in Zookeeper keeps increasing --
> seems like the batch isn't being acked by Trident and I have no idea why,
> especially since the topology has been running successfully the last 20
> days.
>
> I did rebalance the topology on one occasion, after which it continued
> running normally. Other than that, no other modifications were done. Storm
> is at version 0.9.0.1.
>
> Any hints on how to debug the stuck topology? Any other useful info I
> might provide?
>
> Thanks,
>
> --
> Danijel Schiavuzzi
>
> E: [email protected]
> W: www.schiavuzzi.com
> T: +385989035562
> Skype: danijel.schiavuzzi
>
>
>
>
> --
> Danijel Schiavuzzi
>
> E: danije
>
>

-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Re: Topology is stuck

Reply via email to