Re: How will storm replay the tuple tree?

2016-09-15 Thread Cheney Chen
houldnt be
>>> looking into if BoltB is really costly or BoltA is really costly.
>>>
>>> 3. Also failure scenario are suppose to be really really low, and if
>>> your database is down(means 100% tuple will fail), then performance wont be
>>> your only concern. your concern will be to make sure database comes up and
>>> reprocess all failed tuple.
>>>
>>> 4. Also you will have to take care of retry logic in every Bolt.
>>> Currently its only at one place.
>>>
>>>
>>>
>>> *There is one thing i am looking forward from Storm is to inform Spout
>>> about what kind of failure it was*. i.e. if it was ConnectionTimeout or
>>> ReadTimeout etc, that means if i retry it may pass. But say it was null
>>> pointer exception(java world) , i know the data which is being expected is
>>> not there and my code is not handling that scenario, so either i will have
>>> to change code or ask data provider to send that field, but retry wont help
>>> me.
>>>
>>> Currently only way to do is use a outside datastore like Redis,
>>> whichever Bolt you fail add a key with mesageId and Exception/error detail
>>> in redis before calling fail. and then let Spout read that data from redis
>>> with messageId received in onFail call and then spout can decide if i want
>>> to retry or not. I would usually Create two wrappers Retry-able Exception
>>> and *non* Retry-able Exception, so each bolt can inform whether retry
>>> can help or not. Its upto you where you put this decision making logic.
>>>
>>>
>>>
>>> Thanks
>>> Ravi.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Sep 14, 2016 at 6:43 AM, Tech Id <tech.login@gmail.com>
>>> wrote:
>>>
>>>> Thanks Ambud,
>>>>
>>>> I did read some very good things about acking mechanism in Storm but I
>>>> am not sure it explains why point to point checking is expensive.
>>>>
>>>> Consider the example: Spout--> BoltA--->BoltB.
>>>>
>>>> If BoltB fails, it will report failure to the acker.
>>>> If the acker can ask the Spout to replay, then why can't the acker ask
>>>> the parent of BoltB to replay at this point?
>>>> I don't think keeping parent of a bolt could be expensive.
>>>>
>>>>
>>>> On a related note, I am a little confused about a statement "When a new
>>>> tupletree is born, the spout sends the XORed edge-ids of each tuple
>>>> recipient, which the acker records in its pending ledger" in
>>>> Acking-framework-implementation.html
>>>> <http://storm.apache.org/releases/current/Acking-framework-implementation.html>
>>>> .
>>>> How does the spout know before hand which bolts would receive the
>>>> tuple? Bolts forward tuples to other bolts based on groupings and
>>>> dynamically generated fields. How does spout know what fields will be
>>>> generated and which bolts will receive the tuples? If it does not know
>>>> that, then how does it send the XOR of each tuple recipient in a tuple's
>>>> path because each tuple's path will be different (I think, not sure 
>>>> though).
>>>>
>>>>
>>>> Thx,
>>>> T.I.
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 6:37 PM, Ambud Sharma <asharma52...@gmail.com>
>>>> wrote:
>>>>
>>>>> Here is a post on it https://bryantsai.com/fault-to
>>>>> lerant-message-processing-in-storm/.
>>>>>
>>>>> Point to point tracking is expensive unless you are using
>>>>> transactions. Flume does point to point transfers using transactions.
>>>>>
>>>>> On Sep 13, 2016 3:27 PM, "Tech Id" <tech.login@gmail.com> wrote:
>>>>>
>>>>>> I agree with this statement about code/architecture but in case of
>>>>>> some system outages, like one of the end-points (Solr, Couchbase,
>>>>>> Elastic-Search etc.) being down temporarily, a very large number of other
>>>>>> fully-functional and healthy systems will receive a large number of
>>>>>> duplicate replays (especially in heavy throughput topologies).
>>>>>>
>>>>>> If you can elaborate a little more on the performance cost of
>>>>>> tracking tuples or point to a document reflecting the 

Re: Re: How will storm replay the tuple tree?

2016-09-14 Thread S G
Very nice discussion !

I have also been wanting to see a feature something similar to Ravi's
comment above:

"*There is one thing i am looking forward from Storm is to inform Spout
about what kind of failure it was*. i.e. if it was ConnectionTimeout or
ReadTimeout etc, that means if i retry it may pass. But say it was null
pointer exception(java world) , i know the data which is being expected is
not there and my code is not handling that scenario, so either i will have
to change code or ask data provider to send that field, but retry wont help
me."


I think we need

1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector.
2) Provide the ability to *configure a dead-letter data-store in the spout*
for failed messages reported by #1 above.

The configurable data-store should support kafka, solr and redis to
begin-with (Plus the option to implement one's own and dropping a jar file
in the classpath).

Such a feature would benefit all the spouts.

Benefits:
1) Topologies will not block replaying the same doomed-to-fail tuples.
2) Users can set alerts on dead-letters and find out easily actual problems
in their topologies rather than analyze all failed tuples only to find that
they failed because of a temporary network glitch.
3) Since the entire Tuple is put into the dead-letter, all the data is
available for retrying after fixing the topology code.

Thx,
SG



On Wed, Sep 14, 2016 at 7:25 AM, Hart, James W. <jwh...@seic.com> wrote:

> In my testing when a tuple was replayed by a spout, every kafka message
> from the replayed one to the end was replayed.  That’s why all bolts need
> to be idempotent so that replays do not cause work to be done twice.  I
> think it has to do with kafka tracking the offset of the last acked message
> in a topic, not the actual ack of every message individually.  This is a
> simplistic view as it’s a lot more complicated than this.
>
>
>
> If anybody can confirm this, please respond as it was a surprise to me and
> cause me a couple of days of testing when I encountered it.
>
>
>
>
>
> *From:* fanxi...@travelsky.com [mailto:fanxi...@travelsky.com]
> *Sent:* Tuesday, September 13, 2016 9:22 PM
>
> *To:* user
> *Subject:* Re: Re: How will storm replay the tuple tree?
>
>
>
> Yes, only the failed tuple are replayed, but the whole batch will be held.
>
>
>
> So, if the tuple failed forever, the batch will be held forever?
>
>
>
> I am just not clear  the tuple itself or the batch which owns the tuple
> will be held in spout.
>
>
>
>
> --
>
> Josh
>
>
>
>
>
>
>
> *From:* Ambud Sharma <asharma52...@gmail.com>
>
> *Date:* 2016-09-14 09:10
>
> *To:* user <user@storm.apache.org>
>
> *Subject:* Re: Re: How will storm replay the tuple tree?
>
> No as per the code only individual messages are replayed.
>
>
>
> On Sep 13, 2016 6:09 PM, "fanxi...@travelsky.com" <fanxi...@travelsky.com>
> wrote:
>
> Hi:
>
>
>
> I'd like to make clear on something about Kafka-spout referring to ack.
>
>
>
> For example, kafka-spout fetches offset 5000-6000 from Kafka server, but
> one tuple whose offset is 5101 is failed by a bolt, the whole batch of
> 5000-6000 will be remain in kafka-spout until the 5101 tuple will be acked.
> If the 5101 tuple can not be acked for a long time, the batch 5000-6000
> will remain for a long time, and the kafka-spout will stop to fetch data
> from kafka in these time.
>
>
>
> Am I right?
>
>
>
>
> --
>
> Josh
>
>
>
>
>
>
>
> *From:* Tech Id <tech.login@gmail.com>
>
> *Date:* 2016-09-14 06:26
>
> *To:* user <user@storm.apache.org>
>
> *Subject:* Re: How will storm replay the tuple tree?
>
> I agree with this statement about code/architecture but in case of some
> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
> etc.) being down temporarily, a very large number of other fully-functional
> and healthy systems will receive a large number of duplicate replays
> (especially in heavy throughput topologies).
>
>
>
> If you can elaborate a little more on the performance cost of tracking
> tuples or point to a document reflecting the same, that will be of great
> help.
>
>
>
> Best,
>
> T.I.
>
>
>
> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
>
> Failures should be very infrequent, if they are not then rethink the code
> and architecture.  The performance cost of tracking tuples in the way that
> would be required to replay at the failure is large, basically that method
> would slow everything way down f

Re: How will storm replay the tuple tree?

2016-09-14 Thread Cheney Chen
t on it https://bryantsai.com/fault-to
>>> lerant-message-processing-in-storm/.
>>>
>>> Point to point tracking is expensive unless you are using transactions.
>>> Flume does point to point transfers using transactions.
>>>
>>> On Sep 13, 2016 3:27 PM, "Tech Id" <tech.login@gmail.com> wrote:
>>>
>>>> I agree with this statement about code/architecture but in case of some
>>>> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
>>>> etc.) being down temporarily, a very large number of other fully-functional
>>>> and healthy systems will receive a large number of duplicate replays
>>>> (especially in heavy throughput topologies).
>>>>
>>>> If you can elaborate a little more on the performance cost of tracking
>>>> tuples or point to a document reflecting the same, that will be of great
>>>> help.
>>>>
>>>> Best,
>>>> T.I.
>>>>
>>>> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com>
>>>> wrote:
>>>>
>>>>> Failures should be very infrequent, if they are not then rethink the
>>>>> code and architecture.  The performance cost of tracking tuples in the way
>>>>> that would be required to replay at the failure is large, basically that
>>>>> method would slow everything way down for very infrequent failures.
>>>>>
>>>>>
>>>>>
>>>>> *From:* S G [mailto:sg.online.em...@gmail.com]
>>>>> *Sent:* Tuesday, September 13, 2016 3:17 PM
>>>>> *To:* user@storm.apache.org
>>>>> *Subject:* Re: How will storm replay the tuple tree?
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I am a little curious to know why we begin at the spout level for case
>>>>> 1.
>>>>>
>>>>> If we replay at the failing bolt's parent level (BoltA in this case),
>>>>> then it should be more performant due to a decrease in duplicate 
>>>>> processing
>>>>> (as compared to whole tuple tree replays).
>>>>>
>>>>>
>>>>>
>>>>> If BoltA crashes due to some reason while replaying, only then the
>>>>> Spout should receive this as a failure and whole tuple tree should be
>>>>> replayed.
>>>>>
>>>>>
>>>>>
>>>>> This saving in duplicate processing will be more visible with several
>>>>> layers of bolts.
>>>>>
>>>>>
>>>>>
>>>>> I am sure there is a good reason to replay the whole tuple-tree, and
>>>>> want to know the same.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> SG
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Cheney,
>>>>>
>>>>>
>>>>>
>>>>> Replays happen at the spout level. So if there is a failure at any
>>>>> point in the tuple tree (the tuple tree being the anchored emits,
>>>>> unanchored emits don’t count), the original spout tuple will be replayed.
>>>>> So the replayed tuple will traverse the topology again, including
>>>>> unanchored points.
>>>>>
>>>>>
>>>>>
>>>>> If an unanchored tuple fails downstream, it will not trigger a replay.
>>>>>
>>>>>
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>>
>>>>>
>>>>> -Taylor
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi there,
>>>>>
>>>>>
>>>>>
>>>>> We're using storm 1.0.1, and I'm checking through
>>>>> http://storm.apache.org/releases/1.0.1/Guaranteeing-
>>>>> message-processing.html
>>>>>
>>>>>
>>>>>
>>>>> Got questions for below two scenarios.
>>>>>
>>>>> Assume topology: S (spout) --> BoltA --> BoltB
>>>>>
>>>>> 1. S: anchored emit, BoltA: anchored emit
>>>>>
>>>>> Suppose BoltB processing failed w/ ack, what will the replay be, will
>>>>> it execute both BoltA and BoltB or only failed BoltB processing?
>>>>>
>>>>>
>>>>>
>>>>> 2. S: anchored emit, BoltA: unanchored emit
>>>>>
>>>>> Suppose BoltB processing failed w/ ack, replay will not happen,
>>>>> correct?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Qili Chen (Cheney)
>>>>>
>>>>> E-mail: tbcql1...@gmail.com
>>>>> MP: (+1) 4086217503
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>


-- 
Regards,
Qili Chen (Cheney)

E-mail: tbcql1...@gmail.com
MP: (+1) 4086217503


Re: How will storm replay the tuple tree?

2016-09-14 Thread Ravi Sharma
r of duplicate replays
>>> (especially in heavy throughput topologies).
>>>
>>> If you can elaborate a little more on the performance cost of tracking
>>> tuples or point to a document reflecting the same, that will be of great
>>> help.
>>>
>>> Best,
>>> T.I.
>>>
>>> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com>
>>> wrote:
>>>
>>>> Failures should be very infrequent, if they are not then rethink the
>>>> code and architecture.  The performance cost of tracking tuples in the way
>>>> that would be required to replay at the failure is large, basically that
>>>> method would slow everything way down for very infrequent failures.
>>>>
>>>>
>>>>
>>>> *From:* S G [mailto:sg.online.em...@gmail.com]
>>>> *Sent:* Tuesday, September 13, 2016 3:17 PM
>>>> *To:* user@storm.apache.org
>>>> *Subject:* Re: How will storm replay the tuple tree?
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I am a little curious to know why we begin at the spout level for case
>>>> 1.
>>>>
>>>> If we replay at the failing bolt's parent level (BoltA in this case),
>>>> then it should be more performant due to a decrease in duplicate processing
>>>> (as compared to whole tuple tree replays).
>>>>
>>>>
>>>>
>>>> If BoltA crashes due to some reason while replaying, only then the
>>>> Spout should receive this as a failure and whole tuple tree should be
>>>> replayed.
>>>>
>>>>
>>>>
>>>> This saving in duplicate processing will be more visible with several
>>>> layers of bolts.
>>>>
>>>>
>>>>
>>>> I am sure there is a good reason to replay the whole tuple-tree, and
>>>> want to know the same.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> SG
>>>>
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Cheney,
>>>>
>>>>
>>>>
>>>> Replays happen at the spout level. So if there is a failure at any
>>>> point in the tuple tree (the tuple tree being the anchored emits,
>>>> unanchored emits don’t count), the original spout tuple will be replayed.
>>>> So the replayed tuple will traverse the topology again, including
>>>> unanchored points.
>>>>
>>>>
>>>>
>>>> If an unanchored tuple fails downstream, it will not trigger a replay.
>>>>
>>>>
>>>>
>>>> Hope this helps.
>>>>
>>>>
>>>>
>>>> -Taylor
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> Hi there,
>>>>
>>>>
>>>>
>>>> We're using storm 1.0.1, and I'm checking through
>>>> http://storm.apache.org/releases/1.0.1/Guaranteeing-
>>>> message-processing.html
>>>>
>>>>
>>>>
>>>> Got questions for below two scenarios.
>>>>
>>>> Assume topology: S (spout) --> BoltA --> BoltB
>>>>
>>>> 1. S: anchored emit, BoltA: anchored emit
>>>>
>>>> Suppose BoltB processing failed w/ ack, what will the replay be, will
>>>> it execute both BoltA and BoltB or only failed BoltB processing?
>>>>
>>>>
>>>>
>>>> 2. S: anchored emit, BoltA: unanchored emit
>>>>
>>>> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Qili Chen (Cheney)
>>>>
>>>> E-mail: tbcql1...@gmail.com
>>>> MP: (+1) 4086217503
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>


Re: How will storm replay the tuple tree?

2016-09-13 Thread Tech Id
Thanks Ambud,

I did read some very good things about acking mechanism in Storm but I am
not sure it explains why point to point checking is expensive.

Consider the example: Spout--> BoltA--->BoltB.

If BoltB fails, it will report failure to the acker.
If the acker can ask the Spout to replay, then why can't the acker ask the
parent of BoltB to replay at this point?
I don't think keeping parent of a bolt could be expensive.


On a related note, I am a little confused about a statement "When a new
tupletree is born, the spout sends the XORed edge-ids of each tuple
recipient, which the acker records in its pending ledger" in
Acking-framework-implementation.html
<http://storm.apache.org/releases/current/Acking-framework-implementation.html>
.
How does the spout know before hand which bolts would receive the tuple?
Bolts forward tuples to other bolts based on groupings and dynamically
generated fields. How does spout know what fields will be generated and
which bolts will receive the tuples? If it does not know that, then how
does it send the XOR of each tuple recipient in a tuple's path because each
tuple's path will be different (I think, not sure though).


Thx,
T.I.


On Tue, Sep 13, 2016 at 6:37 PM, Ambud Sharma <asharma52...@gmail.com>
wrote:

> Here is a post on it https://bryantsai.com/fault-
> tolerant-message-processing-in-storm/.
>
> Point to point tracking is expensive unless you are using transactions.
> Flume does point to point transfers using transactions.
>
> On Sep 13, 2016 3:27 PM, "Tech Id" <tech.login@gmail.com> wrote:
>
>> I agree with this statement about code/architecture but in case of some
>> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
>> etc.) being down temporarily, a very large number of other fully-functional
>> and healthy systems will receive a large number of duplicate replays
>> (especially in heavy throughput topologies).
>>
>> If you can elaborate a little more on the performance cost of tracking
>> tuples or point to a document reflecting the same, that will be of great
>> help.
>>
>> Best,
>> T.I.
>>
>> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
>>
>>> Failures should be very infrequent, if they are not then rethink the
>>> code and architecture.  The performance cost of tracking tuples in the way
>>> that would be required to replay at the failure is large, basically that
>>> method would slow everything way down for very infrequent failures.
>>>
>>>
>>>
>>> *From:* S G [mailto:sg.online.em...@gmail.com]
>>> *Sent:* Tuesday, September 13, 2016 3:17 PM
>>> *To:* user@storm.apache.org
>>> *Subject:* Re: How will storm replay the tuple tree?
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I am a little curious to know why we begin at the spout level for case 1.
>>>
>>> If we replay at the failing bolt's parent level (BoltA in this case),
>>> then it should be more performant due to a decrease in duplicate processing
>>> (as compared to whole tuple tree replays).
>>>
>>>
>>>
>>> If BoltA crashes due to some reason while replaying, only then the Spout
>>> should receive this as a failure and whole tuple tree should be replayed.
>>>
>>>
>>>
>>> This saving in duplicate processing will be more visible with several
>>> layers of bolts.
>>>
>>>
>>>
>>> I am sure there is a good reason to replay the whole tuple-tree, and
>>> want to know the same.
>>>
>>>
>>>
>>> Thanks
>>>
>>> SG
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
>>> wrote:
>>>
>>> Hi Cheney,
>>>
>>>
>>>
>>> Replays happen at the spout level. So if there is a failure at any point
>>> in the tuple tree (the tuple tree being the anchored emits, unanchored
>>> emits don’t count), the original spout tuple will be replayed. So the
>>> replayed tuple will traverse the topology again, including unanchored
>>> points.
>>>
>>>
>>>
>>> If an unanchored tuple fails downstream, it will not trigger a replay.
>>>
>>>
>>>
>>> Hope this helps.
>>>
>>>
>>>
>>> -Taylor
>>>
>>>
>>>
>>>
>>>
>>> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>>>
>>>
>>>
>>> Hi there,
>>>
>>>
>>>
>>> We're using storm 1.0.1, and I'm checking through http://storm.apache.or
>>> g/releases/1.0.1/Guaranteeing-message-processing.html
>>>
>>>
>>>
>>> Got questions for below two scenarios.
>>>
>>> Assume topology: S (spout) --> BoltA --> BoltB
>>>
>>> 1. S: anchored emit, BoltA: anchored emit
>>>
>>> Suppose BoltB processing failed w/ ack, what will the replay be, will it
>>> execute both BoltA and BoltB or only failed BoltB processing?
>>>
>>>
>>>
>>> 2. S: anchored emit, BoltA: unanchored emit
>>>
>>> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Qili Chen (Cheney)
>>>
>>> E-mail: tbcql1...@gmail.com
>>> MP: (+1) 4086217503
>>>
>>>
>>>
>>>
>>>
>>
>>


Re: How will storm replay the tuple tree?

2016-09-13 Thread Ambud Sharma
Here is a post on it
https://bryantsai.com/fault-tolerant-message-processing-in-storm/.

Point to point tracking is expensive unless you are using transactions.
Flume does point to point transfers using transactions.

On Sep 13, 2016 3:27 PM, "Tech Id" <tech.login@gmail.com> wrote:

> I agree with this statement about code/architecture but in case of some
> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
> etc.) being down temporarily, a very large number of other fully-functional
> and healthy systems will receive a large number of duplicate replays
> (especially in heavy throughput topologies).
>
> If you can elaborate a little more on the performance cost of tracking
> tuples or point to a document reflecting the same, that will be of great
> help.
>
> Best,
> T.I.
>
> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
>
>> Failures should be very infrequent, if they are not then rethink the code
>> and architecture.  The performance cost of tracking tuples in the way that
>> would be required to replay at the failure is large, basically that method
>> would slow everything way down for very infrequent failures.
>>
>>
>>
>> *From:* S G [mailto:sg.online.em...@gmail.com]
>> *Sent:* Tuesday, September 13, 2016 3:17 PM
>> *To:* user@storm.apache.org
>> *Subject:* Re: How will storm replay the tuple tree?
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am a little curious to know why we begin at the spout level for case 1.
>>
>> If we replay at the failing bolt's parent level (BoltA in this case),
>> then it should be more performant due to a decrease in duplicate processing
>> (as compared to whole tuple tree replays).
>>
>>
>>
>> If BoltA crashes due to some reason while replaying, only then the Spout
>> should receive this as a failure and whole tuple tree should be replayed.
>>
>>
>>
>> This saving in duplicate processing will be more visible with several
>> layers of bolts.
>>
>>
>>
>> I am sure there is a good reason to replay the whole tuple-tree, and want
>> to know the same.
>>
>>
>>
>> Thanks
>>
>> SG
>>
>>
>>
>> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
>> wrote:
>>
>> Hi Cheney,
>>
>>
>>
>> Replays happen at the spout level. So if there is a failure at any point
>> in the tuple tree (the tuple tree being the anchored emits, unanchored
>> emits don’t count), the original spout tuple will be replayed. So the
>> replayed tuple will traverse the topology again, including unanchored
>> points.
>>
>>
>>
>> If an unanchored tuple fails downstream, it will not trigger a replay.
>>
>>
>>
>> Hope this helps.
>>
>>
>>
>> -Taylor
>>
>>
>>
>>
>>
>> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>>
>>
>>
>> Hi there,
>>
>>
>>
>> We're using storm 1.0.1, and I'm checking through http://storm.apache.or
>> g/releases/1.0.1/Guaranteeing-message-processing.html
>>
>>
>>
>> Got questions for below two scenarios.
>>
>> Assume topology: S (spout) --> BoltA --> BoltB
>>
>> 1. S: anchored emit, BoltA: anchored emit
>>
>> Suppose BoltB processing failed w/ ack, what will the replay be, will it
>> execute both BoltA and BoltB or only failed BoltB processing?
>>
>>
>>
>> 2. S: anchored emit, BoltA: unanchored emit
>>
>> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>>
>>
>>
>> --
>>
>> Regards,
>> Qili Chen (Cheney)
>>
>> E-mail: tbcql1...@gmail.com
>> MP: (+1) 4086217503
>>
>>
>>
>>
>>
>
>


Re: Re: How will storm replay the tuple tree?

2016-09-13 Thread fanxi...@travelsky.com
Yes, only the failed tuple are replayed, but the whole batch will be held.

So, if the tuple failed forever, the batch will be held forever?

I am just not clear  the tuple itself or the batch which owns the tuple will be 
held in spout.




Josh


 
From: Ambud Sharma
Date: 2016-09-14 09:10
To: user
Subject: Re: Re: How will storm replay the tuple tree?
No as per the code only individual messages are replayed.

On Sep 13, 2016 6:09 PM, "fanxi...@travelsky.com" <fanxi...@travelsky.com> 
wrote:
Hi:

I'd like to make clear on something about Kafka-spout referring to ack.

For example, kafka-spout fetches offset 5000-6000 from Kafka server, but one 
tuple whose offset is 5101 is failed by a bolt, the whole batch of 5000-6000 
will be remain in kafka-spout until the 5101 tuple will be acked. If the 5101 
tuple can not be acked for a long time, the batch 5000-6000 will remain for a 
long time, and the kafka-spout will stop to fetch data from kafka in these time.

Am I right?




Josh


 
From: Tech Id
Date: 2016-09-14 06:26
To: user
Subject: Re: How will storm replay the tuple tree?
I agree with this statement about code/architecture but in case of some system 
outages, like one of the end-points (Solr, Couchbase, Elastic-Search etc.) 
being down temporarily, a very large number of other fully-functional and 
healthy systems will receive a large number of duplicate replays (especially in 
heavy throughput topologies).

If you can elaborate a little more on the performance cost of tracking tuples 
or point to a document reflecting the same, that will be of great help.

Best,
T.I.

On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
Failures should be very infrequent, if they are not then rethink the code and 
architecture.  The performance cost of tracking tuples in the way that would be 
required to replay at the failure is large, basically that method would slow 
everything way down for very infrequent failures.
 
From: S G [mailto:sg.online.em...@gmail.com] 
Sent: Tuesday, September 13, 2016 3:17 PM
To: user@storm.apache.org
Subject: Re: How will storm replay the tuple tree?
 
Hi,
 
I am a little curious to know why we begin at the spout level for case 1.
If we replay at the failing bolt's parent level (BoltA in this case), then it 
should be more performant due to a decrease in duplicate processing (as 
compared to whole tuple tree replays).
 
If BoltA crashes due to some reason while replaying, only then the Spout should 
receive this as a failure and whole tuple tree should be replayed.
 
This saving in duplicate processing will be more visible with several layers of 
bolts.
 
I am sure there is a good reason to replay the whole tuple-tree, and want to 
know the same.
 
Thanks
SG
 
On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:
Hi Cheney,
 
Replays happen at the spout level. So if there is a failure at any point in the 
tuple tree (the tuple tree being the anchored emits, unanchored emits don’t 
count), the original spout tuple will be replayed. So the replayed tuple will 
traverse the topology again, including unanchored points.
 
If an unanchored tuple fails downstream, it will not trigger a replay.
 
Hope this helps.
 
-Taylor
 
 
On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
 
Hi there, 
 
We're using storm 1.0.1, and I'm checking through 
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html
 
Got questions for below two scenarios.
Assume topology: S (spout) --> BoltA --> BoltB
1. S: anchored emit, BoltA: anchored emit
Suppose BoltB processing failed w/ ack, what will the replay be, will it 
execute both BoltA and BoltB or only failed BoltB processing?
 
2. S: anchored emit, BoltA: unanchored emit
Suppose BoltB processing failed w/ ack, replay will not happen, correct?
 
-- 
Regards,
Qili Chen (Cheney)

E-mail: tbcql1...@gmail.com 
MP: (+1) 4086217503
 
 



Re: Re: How will storm replay the tuple tree?

2016-09-13 Thread Ambud Sharma
No as per the code only individual messages are replayed.

On Sep 13, 2016 6:09 PM, "fanxi...@travelsky.com" <fanxi...@travelsky.com>
wrote:

> Hi:
>
> I'd like to make clear on something about Kafka-spout referring to ack.
>
> For example, kafka-spout fetches offset 5000-6000 from Kafka server, but
> one tuple whose offset is 5101 is failed by a bolt, the whole batch of
> 5000-6000 will be remain in kafka-spout until the 5101 tuple will be acked.
> If the 5101 tuple can not be acked for a long time, the batch 5000-6000
> will remain for a long time, and the kafka-spout will stop to fetch data
> from kafka in these time.
>
> Am I right?
>
>
> --
> Josh
>
>
>
> *From:* Tech Id <tech.login@gmail.com>
> *Date:* 2016-09-14 06:26
> *To:* user <user@storm.apache.org>
> *Subject:* Re: How will storm replay the tuple tree?
> I agree with this statement about code/architecture but in case of some
> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
> etc.) being down temporarily, a very large number of other fully-functional
> and healthy systems will receive a large number of duplicate replays
> (especially in heavy throughput topologies).
>
> If you can elaborate a little more on the performance cost of tracking
> tuples or point to a document reflecting the same, that will be of great
> help.
>
> Best,
> T.I.
>
> On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
>
>> Failures should be very infrequent, if they are not then rethink the code
>> and architecture.  The performance cost of tracking tuples in the way that
>> would be required to replay at the failure is large, basically that method
>> would slow everything way down for very infrequent failures.
>>
>>
>>
>> *From:* S G [mailto:sg.online.em...@gmail.com]
>> *Sent:* Tuesday, September 13, 2016 3:17 PM
>> *To:* user@storm.apache.org
>> *Subject:* Re: How will storm replay the tuple tree?
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am a little curious to know why we begin at the spout level for case 1.
>>
>> If we replay at the failing bolt's parent level (BoltA in this case),
>> then it should be more performant due to a decrease in duplicate processing
>> (as compared to whole tuple tree replays).
>>
>>
>>
>> If BoltA crashes due to some reason while replaying, only then the Spout
>> should receive this as a failure and whole tuple tree should be replayed.
>>
>>
>>
>> This saving in duplicate processing will be more visible with several
>> layers of bolts.
>>
>>
>>
>> I am sure there is a good reason to replay the whole tuple-tree, and want
>> to know the same.
>>
>>
>>
>> Thanks
>>
>> SG
>>
>>
>>
>> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
>> wrote:
>>
>> Hi Cheney,
>>
>>
>>
>> Replays happen at the spout level. So if there is a failure at any point
>> in the tuple tree (the tuple tree being the anchored emits, unanchored
>> emits don’t count), the original spout tuple will be replayed. So the
>> replayed tuple will traverse the topology again, including unanchored
>> points.
>>
>>
>>
>> If an unanchored tuple fails downstream, it will not trigger a replay.
>>
>>
>>
>> Hope this helps.
>>
>>
>>
>> -Taylor
>>
>>
>>
>>
>>
>> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>>
>>
>>
>> Hi there,
>>
>>
>>
>> We're using storm 1.0.1, and I'm checking through http://storm.apache.or
>> g/releases/1.0.1/Guaranteeing-message-processing.html
>>
>>
>>
>> Got questions for below two scenarios.
>>
>> Assume topology: S (spout) --> BoltA --> BoltB
>>
>> 1. S: anchored emit, BoltA: anchored emit
>>
>> Suppose BoltB processing failed w/ ack, what will the replay be, will it
>> execute both BoltA and BoltB or only failed BoltB processing?
>>
>>
>>
>> 2. S: anchored emit, BoltA: unanchored emit
>>
>> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>>
>>
>>
>> --
>>
>> Regards,
>> Qili Chen (Cheney)
>>
>> E-mail: tbcql1...@gmail.com
>> MP: (+1) 4086217503
>>
>>
>>
>>
>>
>
>


Re: Re: How will storm replay the tuple tree?

2016-09-13 Thread fanxi...@travelsky.com
Hi:

I'd like to make clear on something about Kafka-spout referring to ack.

For example, kafka-spout fetches offset 5000-6000 from Kafka server, but one 
tuple whose offset is 5101 is failed by a bolt, the whole batch of 5000-6000 
will be remain in kafka-spout until the 5101 tuple will be acked. If the 5101 
tuple can not be acked for a long time, the batch 5000-6000 will remain for a 
long time, and the kafka-spout will stop to fetch data from kafka in these time.

Am I right?




Josh


 
From: Tech Id
Date: 2016-09-14 06:26
To: user
Subject: Re: How will storm replay the tuple tree?
I agree with this statement about code/architecture but in case of some system 
outages, like one of the end-points (Solr, Couchbase, Elastic-Search etc.) 
being down temporarily, a very large number of other fully-functional and 
healthy systems will receive a large number of duplicate replays (especially in 
heavy throughput topologies).

If you can elaborate a little more on the performance cost of tracking tuples 
or point to a document reflecting the same, that will be of great help.

Best,
T.I.

On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:
Failures should be very infrequent, if they are not then rethink the code and 
architecture.  The performance cost of tracking tuples in the way that would be 
required to replay at the failure is large, basically that method would slow 
everything way down for very infrequent failures.
 
From: S G [mailto:sg.online.em...@gmail.com] 
Sent: Tuesday, September 13, 2016 3:17 PM
To: user@storm.apache.org
Subject: Re: How will storm replay the tuple tree?
 
Hi,
 
I am a little curious to know why we begin at the spout level for case 1.
If we replay at the failing bolt's parent level (BoltA in this case), then it 
should be more performant due to a decrease in duplicate processing (as 
compared to whole tuple tree replays).
 
If BoltA crashes due to some reason while replaying, only then the Spout should 
receive this as a failure and whole tuple tree should be replayed.
 
This saving in duplicate processing will be more visible with several layers of 
bolts.
 
I am sure there is a good reason to replay the whole tuple-tree, and want to 
know the same.
 
Thanks
SG
 
On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:
Hi Cheney,
 
Replays happen at the spout level. So if there is a failure at any point in the 
tuple tree (the tuple tree being the anchored emits, unanchored emits don’t 
count), the original spout tuple will be replayed. So the replayed tuple will 
traverse the topology again, including unanchored points.
 
If an unanchored tuple fails downstream, it will not trigger a replay.
 
Hope this helps.
 
-Taylor
 
 
On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
 
Hi there, 
 
We're using storm 1.0.1, and I'm checking through 
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html
 
Got questions for below two scenarios.
Assume topology: S (spout) --> BoltA --> BoltB
1. S: anchored emit, BoltA: anchored emit
Suppose BoltB processing failed w/ ack, what will the replay be, will it 
execute both BoltA and BoltB or only failed BoltB processing?
 
2. S: anchored emit, BoltA: unanchored emit
Suppose BoltB processing failed w/ ack, replay will not happen, correct?
 
-- 
Regards,
Qili Chen (Cheney)

E-mail: tbcql1...@gmail.com 
MP: (+1) 4086217503
 
 



Re: How will storm replay the tuple tree?

2016-09-13 Thread Tech Id
I agree with this statement about code/architecture but in case of some
system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
etc.) being down temporarily, a very large number of other fully-functional
and healthy systems will receive a large number of duplicate replays
(especially in heavy throughput topologies).

If you can elaborate a little more on the performance cost of tracking
tuples or point to a document reflecting the same, that will be of great
help.

Best,
T.I.

On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <jwh...@seic.com> wrote:

> Failures should be very infrequent, if they are not then rethink the code
> and architecture.  The performance cost of tracking tuples in the way that
> would be required to replay at the failure is large, basically that method
> would slow everything way down for very infrequent failures.
>
>
>
> *From:* S G [mailto:sg.online.em...@gmail.com]
> *Sent:* Tuesday, September 13, 2016 3:17 PM
> *To:* user@storm.apache.org
> *Subject:* Re: How will storm replay the tuple tree?
>
>
>
> Hi,
>
>
>
> I am a little curious to know why we begin at the spout level for case 1.
>
> If we replay at the failing bolt's parent level (BoltA in this case), then
> it should be more performant due to a decrease in duplicate processing (as
> compared to whole tuple tree replays).
>
>
>
> If BoltA crashes due to some reason while replaying, only then the Spout
> should receive this as a failure and whole tuple tree should be replayed.
>
>
>
> This saving in duplicate processing will be more visible with several
> layers of bolts.
>
>
>
> I am sure there is a good reason to replay the whole tuple-tree, and want
> to know the same.
>
>
>
> Thanks
>
> SG
>
>
>
> On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <ptgo...@gmail.com>
> wrote:
>
> Hi Cheney,
>
>
>
> Replays happen at the spout level. So if there is a failure at any point
> in the tuple tree (the tuple tree being the anchored emits, unanchored
> emits don’t count), the original spout tuple will be replayed. So the
> replayed tuple will traverse the topology again, including unanchored
> points.
>
>
>
> If an unanchored tuple fails downstream, it will not trigger a replay.
>
>
>
> Hope this helps.
>
>
>
> -Taylor
>
>
>
>
>
> On Sep 13, 2016, at 4:42 AM, Cheney Chen <tbcql1...@gmail.com> wrote:
>
>
>
> Hi there,
>
>
>
> We're using storm 1.0.1, and I'm checking through http://storm.apache.
> org/releases/1.0.1/Guaranteeing-message-processing.html
>
>
>
> Got questions for below two scenarios.
>
> Assume topology: S (spout) --> BoltA --> BoltB
>
> 1. S: anchored emit, BoltA: anchored emit
>
> Suppose BoltB processing failed w/ ack, what will the replay be, will it
> execute both BoltA and BoltB or only failed BoltB processing?
>
>
>
> 2. S: anchored emit, BoltA: unanchored emit
>
> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>
>
>
> --
>
> Regards,
> Qili Chen (Cheney)
>
> E-mail: tbcql1...@gmail.com
> MP: (+1) 4086217503
>
>
>
>
>


RE: How will storm replay the tuple tree?

2016-09-13 Thread Hart, James W.
Failures should be very infrequent, if they are not then rethink the code and 
architecture.  The performance cost of tracking tuples in the way that would be 
required to replay at the failure is large, basically that method would slow 
everything way down for very infrequent failures.

From: S G [mailto:sg.online.em...@gmail.com]
Sent: Tuesday, September 13, 2016 3:17 PM
To: user@storm.apache.org
Subject: Re: How will storm replay the tuple tree?

Hi,

I am a little curious to know why we begin at the spout level for case 1.
If we replay at the failing bolt's parent level (BoltA in this case), then it 
should be more performant due to a decrease in duplicate processing (as 
compared to whole tuple tree replays).

If BoltA crashes due to some reason while replaying, only then the Spout should 
receive this as a failure and whole tuple tree should be replayed.

This saving in duplicate processing will be more visible with several layers of 
bolts.

I am sure there is a good reason to replay the whole tuple-tree, and want to 
know the same.

Thanks
SG

On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz 
<ptgo...@gmail.com<mailto:ptgo...@gmail.com>> wrote:
Hi Cheney,

Replays happen at the spout level. So if there is a failure at any point in the 
tuple tree (the tuple tree being the anchored emits, unanchored emits don’t 
count), the original spout tuple will be replayed. So the replayed tuple will 
traverse the topology again, including unanchored points.

If an unanchored tuple fails downstream, it will not trigger a replay.

Hope this helps.

-Taylor


On Sep 13, 2016, at 4:42 AM, Cheney Chen 
<tbcql1...@gmail.com<mailto:tbcql1...@gmail.com>> wrote:

Hi there,

We're using storm 1.0.1, and I'm checking through 
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

Got questions for below two scenarios.
Assume topology: S (spout) --> BoltA --> BoltB
1. S: anchored emit, BoltA: anchored emit
Suppose BoltB processing failed w/ ack, what will the replay be, will it 
execute both BoltA and BoltB or only failed BoltB processing?

2. S: anchored emit, BoltA: unanchored emit
Suppose BoltB processing failed w/ ack, replay will not happen, correct?

--
Regards,
Qili Chen (Cheney)

E-mail: tbcql1...@gmail.com<mailto:tbcql1...@gmail.com>
MP: (+1) 4086217503<tel:%28%2B1%29%204086217503>




Re: How will storm replay the tuple tree?

2016-09-13 Thread S G
Hi,

I am a little curious to know why we begin at the spout level for case 1.
If we replay at the failing bolt's parent level (BoltA in this case), then
it should be more performant due to a decrease in duplicate processing (as
compared to whole tuple tree replays).

If BoltA crashes due to some reason while replaying, only then the Spout
should receive this as a failure and whole tuple tree should be replayed.

This saving in duplicate processing will be more visible with several
layers of bolts.

I am sure there is a good reason to replay the whole tuple-tree, and want
to know the same.

Thanks
SG

On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz  wrote:

> Hi Cheney,
>
> Replays happen at the spout level. So if there is a failure at any point
> in the tuple tree (the tuple tree being the anchored emits, unanchored
> emits don’t count), the original spout tuple will be replayed. So the
> replayed tuple will traverse the topology again, including unanchored
> points.
>
> If an unanchored tuple fails downstream, it will not trigger a replay.
>
> Hope this helps.
>
> -Taylor
>
>
> On Sep 13, 2016, at 4:42 AM, Cheney Chen  wrote:
>
> Hi there,
>
> We're using storm 1.0.1, and I'm checking through http://storm.apache.
> org/releases/1.0.1/Guaranteeing-message-processing.html
>
> Got questions for below two scenarios.
> Assume topology: S (spout) --> BoltA --> BoltB
> 1. S: anchored emit, BoltA: anchored emit
> Suppose BoltB processing failed w/ ack, what will the replay be, will it
> execute both BoltA and BoltB or only failed BoltB processing?
>
> 2. S: anchored emit, BoltA: unanchored emit
> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
>
> --
> Regards,
> Qili Chen (Cheney)
>
> E-mail: tbcql1...@gmail.com
> MP: (+1) 4086217503
>
>
>


Re: How will storm replay the tuple tree?

2016-09-13 Thread P. Taylor Goetz
Hi Cheney,

Replays happen at the spout level. So if there is a failure at any point in the 
tuple tree (the tuple tree being the anchored emits, unanchored emits don’t 
count), the original spout tuple will be replayed. So the replayed tuple will 
traverse the topology again, including unanchored points.

If an unanchored tuple fails downstream, it will not trigger a replay.

Hope this helps.

-Taylor


> On Sep 13, 2016, at 4:42 AM, Cheney Chen  wrote:
> 
> Hi there,
> 
> We're using storm 1.0.1, and I'm checking through 
> http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html 
> 
> 
> Got questions for below two scenarios.
> Assume topology: S (spout) --> BoltA --> BoltB
> 1. S: anchored emit, BoltA: anchored emit
> Suppose BoltB processing failed w/ ack, what will the replay be, will it 
> execute both BoltA and BoltB or only failed BoltB processing?
> 
> 2. S: anchored emit, BoltA: unanchored emit
> Suppose BoltB processing failed w/ ack, replay will not happen, correct?
> 
> --
> Regards,
> Qili Chen (Cheney)
> 
> E-mail: tbcql1...@gmail.com 
> MP: (+1) 4086217503



signature.asc
Description: Message signed with OpenPGP using GPGMail


How will storm replay the tuple tree?

2016-09-13 Thread Cheney Chen
Hi there,

We're using storm 1.0.1, and I'm checking through
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

Got questions for below two scenarios.
Assume topology: S (spout) --> BoltA --> BoltB
1. S: anchored emit, BoltA: anchored emit
Suppose BoltB processing failed w/ ack, what will the replay be, will it
execute both BoltA and BoltB or only failed BoltB processing?

2. S: anchored emit, BoltA: unanchored emit
Suppose BoltB processing failed w/ ack, replay will not happen, correct?

-- 
Regards,
Qili Chen (Cheney)

E-mail: tbcql1...@gmail.com
MP: (+1) 4086217503