Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-27 Thread Ravi Sharma
Yes good idea, would love to have this functionality of passing some user
defined data from bolt to spout on failures.

Ravi

On 27 Sep 2016 10:05 p.m., "Kyle Nusbaum" 
wrote:

> It seems to me that this can be solved by allowing a user to attach some
> arbitrary data to a call to fail(), which is passed to the spout.
> So there would be an override for fail in IOutputCollector which takes
> both the Tuple input and also some object to give to the spout. The spout's
> fail method would now accept an object as a second argument.
>
> The spout can then decide what to do about the failure based on the
> content of the object.
>
> This makes it generic, possibly useful for other things like reporting,
> etc. I only looked at the relevant code briefly, but it looks like it would
> also be relatively simple to implement. -- Kyle
>
> On Tuesday, September 27, 2016 12:06 PM, Tech Id <
> tech.login@gmail.com> wrote:
>
>
>  Any more thoughts on this?
> Seems like a useful feature for all the spouts/bolts.
>
> On Wed, Sep 21, 2016 at 9:09 AM, S G  wrote:
>
> > Thank you Aaron.
> >
> > We use Kafka and JMS spouts and several bolts - Elastic-Search, Solr,
> > Cassandra, Couchbase and HDFS in different scenarios and need to have the
> > dead letter functionality for almost all these scenarios.
> > Locally we have this functionality almost ready for writing dead-letters
> to
> > Solr or Kafka.
> > I will try to contribute the same to Storm as a PR and we can then look
> > into adding the failing tuple as well. I agree adding the failing tuple
> > would be somewhat more complicated.
> >
> >
> > On Tue, Sep 20, 2016 at 4:34 PM, Aaron Niskodé-Dossett <
> doss...@gmail.com>
> > wrote:
> >
> > > I like the idea, especially if it can be implemented as generically as
> > > possible. Ideally we could "dead letter" both the original tuple and
> the
> > > tuple that itself failed. Intervening transformations could have
> changed
> > > the original tuple. I realize that's adds a lot of complexity to your
> > idea
> > > and may not be feasible.
> > > On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:
> > >
> > > > Hi,
> > > >
> > > > I want to gather some thoughts on a suggestion to provide a
> dead-letter
> > > > functionality common to all spouts/bolts.
> > > >
> > > > Currently, if any spout / bolt reports a failure, it is retried by
> the
> > > > spout.
> > > > For a single bolt-failure in a large ADG, this retry logic can cause
> > > > several perfectly successful component to replay and yet the Tuple
> > could
> > > > fail exactly at the same bolt on retry.
> > > >
> > > > This is fine usually (if the failure was temporary, say due to a
> > network
> > > > glitch) but sometimes, the message is bad enough such that it should
> > not
> > > be
> > > > retried but at the same time important enough that its failure should
> > not
> > > > be ignored.
> > > >
> > > > Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
> > > >
> > > > Most of the times, it is able to deserialize the bytes correctly but
> > > > sometimes a badly formatted message fails to deserialize. For such
> > cases,
> > > > neither Kafka-Spout should retry nor ES-bolt should report a success.
> > It
> > > > should however be reported to the user somehow that a badly
> serialized
> > > > message entered the stream.
> > > >
> > > > For cases like temporary network glitch, the Tuple should be retried.
> > > >
> > > > So the proposal is to implement a dead-letter topic as:
> > > >
> > > > 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the
> > > collector.
> > > > Bolts will begin using it once its available for use.
> > > >
> > > > 2) Provide the ability to *configure a dead-letter data-store in the
> > > > spout* for
> > > > failed messages reported by #1 above.
> > > >
> > > >
> > > > The configurable data-store should support kafka, solr and redis to
> > > > begin-with (Plus the option to implement one's own and dropping a jar
> > > file
> > > > in the classpath).
> > > >
> > > > Such a feature should benefit all the spouts as:
> > > >
> > > > 1) Topologies will not block replaying the same doomed-to-fail
> tuples.
> > > > 2) Users can set alerts on dead-letters and find out easily actual
> > > problems
> > > > in their topologies rather than analyze all failed tuples only to
> find
> > > that
> > > > they failed because of a temporary network glitch.
> > > > 3) Since the entire Tuple is put into the dead-letter, all the data
> is
> > > > available for retrying after fixing the topology code.
> > > >
> > > > Please share your thoughts if you think it can benefit storm in a
> > generic
> > > > way.
> > > >
> > > > Thx,
> > > > SG
> > > >
> > >
> >
>
>


Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-27 Thread Kyle Nusbaum
It seems to me that this can be solved by allowing a user to attach some 
arbitrary data to a call to fail(), which is passed to the spout.
So there would be an override for fail in IOutputCollector which takes both the 
Tuple input and also some object to give to the spout. The spout's fail method 
would now accept an object as a second argument.

The spout can then decide what to do about the failure based on the content of 
the object.

This makes it generic, possibly useful for other things like reporting, etc. I 
only looked at the relevant code briefly, but it looks like it would also be 
relatively simple to implement. -- Kyle 

On Tuesday, September 27, 2016 12:06 PM, Tech Id  
wrote:
 

 Any more thoughts on this?
Seems like a useful feature for all the spouts/bolts.

On Wed, Sep 21, 2016 at 9:09 AM, S G  wrote:

> Thank you Aaron.
>
> We use Kafka and JMS spouts and several bolts - Elastic-Search, Solr,
> Cassandra, Couchbase and HDFS in different scenarios and need to have the
> dead letter functionality for almost all these scenarios.
> Locally we have this functionality almost ready for writing dead-letters to
> Solr or Kafka.
> I will try to contribute the same to Storm as a PR and we can then look
> into adding the failing tuple as well. I agree adding the failing tuple
> would be somewhat more complicated.
>
>
> On Tue, Sep 20, 2016 at 4:34 PM, Aaron Niskodé-Dossett 
> wrote:
>
> > I like the idea, especially if it can be implemented as generically as
> > possible. Ideally we could "dead letter" both the original tuple and the
> > tuple that itself failed. Intervening transformations could have changed
> > the original tuple. I realize that's adds a lot of complexity to your
> idea
> > and may not be feasible.
> > On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:
> >
> > > Hi,
> > >
> > > I want to gather some thoughts on a suggestion to provide a dead-letter
> > > functionality common to all spouts/bolts.
> > >
> > > Currently, if any spout / bolt reports a failure, it is retried by the
> > > spout.
> > > For a single bolt-failure in a large ADG, this retry logic can cause
> > > several perfectly successful component to replay and yet the Tuple
> could
> > > fail exactly at the same bolt on retry.
> > >
> > > This is fine usually (if the failure was temporary, say due to a
> network
> > > glitch) but sometimes, the message is bad enough such that it should
> not
> > be
> > > retried but at the same time important enough that its failure should
> not
> > > be ignored.
> > >
> > > Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
> > >
> > > Most of the times, it is able to deserialize the bytes correctly but
> > > sometimes a badly formatted message fails to deserialize. For such
> cases,
> > > neither Kafka-Spout should retry nor ES-bolt should report a success.
> It
> > > should however be reported to the user somehow that a badly serialized
> > > message entered the stream.
> > >
> > > For cases like temporary network glitch, the Tuple should be retried.
> > >
> > > So the proposal is to implement a dead-letter topic as:
> > >
> > > 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the
> > collector.
> > > Bolts will begin using it once its available for use.
> > >
> > > 2) Provide the ability to *configure a dead-letter data-store in the
> > > spout* for
> > > failed messages reported by #1 above.
> > >
> > >
> > > The configurable data-store should support kafka, solr and redis to
> > > begin-with (Plus the option to implement one's own and dropping a jar
> > file
> > > in the classpath).
> > >
> > > Such a feature should benefit all the spouts as:
> > >
> > > 1) Topologies will not block replaying the same doomed-to-fail tuples.
> > > 2) Users can set alerts on dead-letters and find out easily actual
> > problems
> > > in their topologies rather than analyze all failed tuples only to find
> > that
> > > they failed because of a temporary network glitch.
> > > 3) Since the entire Tuple is put into the dead-letter, all the data is
> > > available for retrying after fixing the topology code.
> > >
> > > Please share your thoughts if you think it can benefit storm in a
> generic
> > > way.
> > >
> > > Thx,
> > > SG
> > >
> >
>

   

Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-27 Thread Tech Id
Any more thoughts on this?
Seems like a useful feature for all the spouts/bolts.

On Wed, Sep 21, 2016 at 9:09 AM, S G  wrote:

> Thank you Aaron.
>
> We use Kafka and JMS spouts and several bolts - Elastic-Search, Solr,
> Cassandra, Couchbase and HDFS in different scenarios and need to have the
> dead letter functionality for almost all these scenarios.
> Locally we have this functionality almost ready for writing dead-letters to
> Solr or Kafka.
> I will try to contribute the same to Storm as a PR and we can then look
> into adding the failing tuple as well. I agree adding the failing tuple
> would be somewhat more complicated.
>
>
> On Tue, Sep 20, 2016 at 4:34 PM, Aaron Niskodé-Dossett 
> wrote:
>
> > I like the idea, especially if it can be implemented as generically as
> > possible. Ideally we could "dead letter" both the original tuple and the
> > tuple that itself failed. Intervening transformations could have changed
> > the original tuple. I realize that's adds a lot of complexity to your
> idea
> > and may not be feasible.
> > On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:
> >
> > > Hi,
> > >
> > > I want to gather some thoughts on a suggestion to provide a dead-letter
> > > functionality common to all spouts/bolts.
> > >
> > > Currently, if any spout / bolt reports a failure, it is retried by the
> > > spout.
> > > For a single bolt-failure in a large ADG, this retry logic can cause
> > > several perfectly successful component to replay and yet the Tuple
> could
> > > fail exactly at the same bolt on retry.
> > >
> > > This is fine usually (if the failure was temporary, say due to a
> network
> > > glitch) but sometimes, the message is bad enough such that it should
> not
> > be
> > > retried but at the same time important enough that its failure should
> not
> > > be ignored.
> > >
> > > Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
> > >
> > > Most of the times, it is able to deserialize the bytes correctly but
> > > sometimes a badly formatted message fails to deserialize. For such
> cases,
> > > neither Kafka-Spout should retry nor ES-bolt should report a success.
> It
> > > should however be reported to the user somehow that a badly serialized
> > > message entered the stream.
> > >
> > > For cases like temporary network glitch, the Tuple should be retried.
> > >
> > > So the proposal is to implement a dead-letter topic as:
> > >
> > > 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the
> > collector.
> > > Bolts will begin using it once its available for use.
> > >
> > > 2) Provide the ability to *configure a dead-letter data-store in the
> > > spout* for
> > > failed messages reported by #1 above.
> > >
> > >
> > > The configurable data-store should support kafka, solr and redis to
> > > begin-with (Plus the option to implement one's own and dropping a jar
> > file
> > > in the classpath).
> > >
> > > Such a feature should benefit all the spouts as:
> > >
> > > 1) Topologies will not block replaying the same doomed-to-fail tuples.
> > > 2) Users can set alerts on dead-letters and find out easily actual
> > problems
> > > in their topologies rather than analyze all failed tuples only to find
> > that
> > > they failed because of a temporary network glitch.
> > > 3) Since the entire Tuple is put into the dead-letter, all the data is
> > > available for retrying after fixing the topology code.
> > >
> > > Please share your thoughts if you think it can benefit storm in a
> generic
> > > way.
> > >
> > > Thx,
> > > SG
> > >
> >
>


Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-21 Thread S G
Thank you Aaron.

We use Kafka and JMS spouts and several bolts - Elastic-Search, Solr,
Cassandra, Couchbase and HDFS in different scenarios and need to have the
dead letter functionality for almost all these scenarios.
Locally we have this functionality almost ready for writing dead-letters to
Solr or Kafka.
I will try to contribute the same to Storm as a PR and we can then look
into adding the failing tuple as well. I agree adding the failing tuple
would be somewhat more complicated.


On Tue, Sep 20, 2016 at 4:34 PM, Aaron Niskodé-Dossett 
wrote:

> I like the idea, especially if it can be implemented as generically as
> possible. Ideally we could "dead letter" both the original tuple and the
> tuple that itself failed. Intervening transformations could have changed
> the original tuple. I realize that's adds a lot of complexity to your idea
> and may not be feasible.
> On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:
>
> > Hi,
> >
> > I want to gather some thoughts on a suggestion to provide a dead-letter
> > functionality common to all spouts/bolts.
> >
> > Currently, if any spout / bolt reports a failure, it is retried by the
> > spout.
> > For a single bolt-failure in a large ADG, this retry logic can cause
> > several perfectly successful component to replay and yet the Tuple could
> > fail exactly at the same bolt on retry.
> >
> > This is fine usually (if the failure was temporary, say due to a network
> > glitch) but sometimes, the message is bad enough such that it should not
> be
> > retried but at the same time important enough that its failure should not
> > be ignored.
> >
> > Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
> >
> > Most of the times, it is able to deserialize the bytes correctly but
> > sometimes a badly formatted message fails to deserialize. For such cases,
> > neither Kafka-Spout should retry nor ES-bolt should report a success. It
> > should however be reported to the user somehow that a badly serialized
> > message entered the stream.
> >
> > For cases like temporary network glitch, the Tuple should be retried.
> >
> > So the proposal is to implement a dead-letter topic as:
> >
> > 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the
> collector.
> > Bolts will begin using it once its available for use.
> >
> > 2) Provide the ability to *configure a dead-letter data-store in the
> > spout* for
> > failed messages reported by #1 above.
> >
> >
> > The configurable data-store should support kafka, solr and redis to
> > begin-with (Plus the option to implement one's own and dropping a jar
> file
> > in the classpath).
> >
> > Such a feature should benefit all the spouts as:
> >
> > 1) Topologies will not block replaying the same doomed-to-fail tuples.
> > 2) Users can set alerts on dead-letters and find out easily actual
> problems
> > in their topologies rather than analyze all failed tuples only to find
> that
> > they failed because of a temporary network glitch.
> > 3) Since the entire Tuple is put into the dead-letter, all the data is
> > available for retrying after fixing the topology code.
> >
> > Please share your thoughts if you think it can benefit storm in a generic
> > way.
> >
> > Thx,
> > SG
> >
>


Re: [DISCUSS] Provision for dead-letter topic in storm

2016-09-20 Thread Aaron Niskodé-Dossett
I like the idea, especially if it can be implemented as generically as
possible. Ideally we could "dead letter" both the original tuple and the
tuple that itself failed. Intervening transformations could have changed
the original tuple. I realize that's adds a lot of complexity to your idea
and may not be feasible.
On Tue, Sep 20, 2016 at 1:15 AM S G  wrote:

> Hi,
>
> I want to gather some thoughts on a suggestion to provide a dead-letter
> functionality common to all spouts/bolts.
>
> Currently, if any spout / bolt reports a failure, it is retried by the
> spout.
> For a single bolt-failure in a large ADG, this retry logic can cause
> several perfectly successful component to replay and yet the Tuple could
> fail exactly at the same bolt on retry.
>
> This is fine usually (if the failure was temporary, say due to a network
> glitch) but sometimes, the message is bad enough such that it should not be
> retried but at the same time important enough that its failure should not
> be ignored.
>
> Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.
>
> Most of the times, it is able to deserialize the bytes correctly but
> sometimes a badly formatted message fails to deserialize. For such cases,
> neither Kafka-Spout should retry nor ES-bolt should report a success. It
> should however be reported to the user somehow that a badly serialized
> message entered the stream.
>
> For cases like temporary network glitch, the Tuple should be retried.
>
> So the proposal is to implement a dead-letter topic as:
>
> 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector.
> Bolts will begin using it once its available for use.
>
> 2) Provide the ability to *configure a dead-letter data-store in the
> spout* for
> failed messages reported by #1 above.
>
>
> The configurable data-store should support kafka, solr and redis to
> begin-with (Plus the option to implement one's own and dropping a jar file
> in the classpath).
>
> Such a feature should benefit all the spouts as:
>
> 1) Topologies will not block replaying the same doomed-to-fail tuples.
> 2) Users can set alerts on dead-letters and find out easily actual problems
> in their topologies rather than analyze all failed tuples only to find that
> they failed because of a temporary network glitch.
> 3) Since the entire Tuple is put into the dead-letter, all the data is
> available for retrying after fixing the topology code.
>
> Please share your thoughts if you think it can benefit storm in a generic
> way.
>
> Thx,
> SG
>


[DISCUSS] Provision for dead-letter topic in storm

2016-09-19 Thread S G
Hi,

I want to gather some thoughts on a suggestion to provide a dead-letter
functionality common to all spouts/bolts.

Currently, if any spout / bolt reports a failure, it is retried by the
spout.
For a single bolt-failure in a large ADG, this retry logic can cause
several perfectly successful component to replay and yet the Tuple could
fail exactly at the same bolt on retry.

This is fine usually (if the failure was temporary, say due to a network
glitch) but sometimes, the message is bad enough such that it should not be
retried but at the same time important enough that its failure should not
be ignored.

Example: ElasticSearch-bolt receiving bytes from Kafka-Spout.

Most of the times, it is able to deserialize the bytes correctly but
sometimes a badly formatted message fails to deserialize. For such cases,
neither Kafka-Spout should retry nor ES-bolt should report a success. It
should however be reported to the user somehow that a badly serialized
message entered the stream.

For cases like temporary network glitch, the Tuple should be retried.

So the proposal is to implement a dead-letter topic as:

1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector.
Bolts will begin using it once its available for use.

2) Provide the ability to *configure a dead-letter data-store in the spout* for
failed messages reported by #1 above.


The configurable data-store should support kafka, solr and redis to
begin-with (Plus the option to implement one's own and dropping a jar file
in the classpath).

Such a feature should benefit all the spouts as:

1) Topologies will not block replaying the same doomed-to-fail tuples.
2) Users can set alerts on dead-letters and find out easily actual problems
in their topologies rather than analyze all failed tuples only to find that
they failed because of a temporary network glitch.
3) Since the entire Tuple is put into the dead-letter, all the data is
available for retrying after fixing the topology code.

Please share your thoughts if you think it can benefit storm in a generic
way.

Thx,
SG