Are your tuples from the spout timing out and being replayed? This could cause the topology to spin doing the same table/s over and over. On May 13, 2015 8:12 PM, "Eran Chinthaka Withana" <[email protected]> wrote:
> Hi, > > Storm version: 0.9.2 > > I'm running a topology where, based on an event, I try to synch one > database to the other. After a kafka spout (with db info in the message), > - the first bolt sends a tuple for each table in the db, > - the second bolt, reads from the given table and sends batches of rows > for a given table > - third bolt, writes data to the database > - fourth one (field grouping with 3rd bolt) sends success/failure for a > table > - last one (field grouping with 4th bolt) collects all table level info > and sends out the final message > > This topology runs without any issue for small databases. But when the db > gets slightly larger seems like the topology gets stuck after processing > some tuples and not proceeding beyond that. > > I saw a discussion similar to this here[1] but seems it is happening due > to too many pending spout messages. But in my cases, its related to the > large number of tuples coming out from bolts. As yoyu can imagine the fan > out from the second bolt can be extremely high. For example, in one case I > was sending as many as 1000 tuples from second to third bolt and from there > to 4th. > > I'm just wondering why this is getting stuck? Are there any buffer sizes > in play here? How can I fix this without ideally not changing the topology > design. > > Really appreciate your input here. > > Thanks, > Eran Withana >
