Podling Report Reminder - February 2018

2018-02-05 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 21 February 2018, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, February 07).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/February2018

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


Re: About stream manager's quitting logic on connection failures

2018-02-05 Thread Ning Wang
PR is here: https://github.com/twitter/heron/pull/2711

It should be quite simple, most changes are in config files.


On Mon, Feb 5, 2018 at 1:40 PM, Ning Wang  wrote:

> Cool. Thanks!
>
> On Mon, Feb 5, 2018 at 11:01 AM, Karthik Ramasamy 
> wrote:
>
>> Ning - let us get this rolled out soon.
>>
>> Cheers
>> /karthik
>>
>> > On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarni 
>> wrote:
>> >
>> > This sounds good to me!
>> >
>> > On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang  wrote:
>> >
>> >> Yeah. That is an option too. In fact it was my first try:
>> >> https://github.com/twitter/heron/pull/2693 (just an initiative, not
>> >> completed, a count map should be used instead of a single total count)
>> >>
>> >> In most cases, I think both solutions should have the same result. A
>> few
>> >> reasons I changed to a tmaster check:
>> >> - with tmaster, there is only one source of truth and tmaster is more
>> >> critical anyway. If the tmaster link is not healthy, stmgrs won't work
>> >> correctly: topology may have created replacement nodes but the
>> disconnected
>> >> nodes could keep going by themselves.
>> >> - it is more straightforward. The logic is the same as the current
>> one. One
>> >> the other side, if we use an array for all remote stmgrs, we could
>> have a
>> >> smarter logic (which is good) but it could make stmgrs more
>> complicated and
>> >> less straightforward (bad). I left the stmgr counters there so if in
>> future
>> >> we decide to add this feature, it should be easy to add. There is a gap
>> >> between "errors from all" and "errors from a few" and this is not a
>> >> simple/quick question.
>> >>
>> >>
>> >>
>> >>
>> >> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni 
>> >> wrote:
>> >>
>> >>> I could't add comments to the document, thus am posting my comments to
>> >> the
>> >>> mailing list
>> >>> One more approach could be to do the current measurement as it is, but
>> >>> instead of leaving the quitting decision to the stmgtclient, have
>> >>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects
>> >>> connection issues, inform that to stmgrclientmgr which keeps a map of
>> >>> peerstmgrid to error count. Thus it is able to decide things like am i
>> >>> seeing connection errors from all stmgrs or if only a few of them are
>> >>> having issues. Then it can take the decisions better.
>> >>>
>> >>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang 
>> wrote:
>> >>>
>>  Hi, heron devs~
>> 
>>  I think the current stream manager's quitting logic on connection
>> >>> failures
>>  is problematic. We saw a few internal cases in Twitter that this
>> logic
>>  could cause extra issue.
>> 
>>  Here is a doc with more details:
>> 
>>  https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
>>  sAbzBqCu4-iDUM/edit#
>> 
>>  Comments and feedbacks are welcome!
>> 
>>  Thanks.
>>  --ning
>> 
>> >>>
>> >>
>>
>>
>


Re: About stream manager's quitting logic on connection failures

2018-02-05 Thread Ning Wang
Cool. Thanks!

On Mon, Feb 5, 2018 at 11:01 AM, Karthik Ramasamy 
wrote:

> Ning - let us get this rolled out soon.
>
> Cheers
> /karthik
>
> > On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarni 
> wrote:
> >
> > This sounds good to me!
> >
> > On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang  wrote:
> >
> >> Yeah. That is an option too. In fact it was my first try:
> >> https://github.com/twitter/heron/pull/2693 (just an initiative, not
> >> completed, a count map should be used instead of a single total count)
> >>
> >> In most cases, I think both solutions should have the same result. A few
> >> reasons I changed to a tmaster check:
> >> - with tmaster, there is only one source of truth and tmaster is more
> >> critical anyway. If the tmaster link is not healthy, stmgrs won't work
> >> correctly: topology may have created replacement nodes but the
> disconnected
> >> nodes could keep going by themselves.
> >> - it is more straightforward. The logic is the same as the current one.
> One
> >> the other side, if we use an array for all remote stmgrs, we could have
> a
> >> smarter logic (which is good) but it could make stmgrs more complicated
> and
> >> less straightforward (bad). I left the stmgr counters there so if in
> future
> >> we decide to add this feature, it should be easy to add. There is a gap
> >> between "errors from all" and "errors from a few" and this is not a
> >> simple/quick question.
> >>
> >>
> >>
> >>
> >> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni 
> >> wrote:
> >>
> >>> I could't add comments to the document, thus am posting my comments to
> >> the
> >>> mailing list
> >>> One more approach could be to do the current measurement as it is, but
> >>> instead of leaving the quitting decision to the stmgtclient, have
> >>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects
> >>> connection issues, inform that to stmgrclientmgr which keeps a map of
> >>> peerstmgrid to error count. Thus it is able to decide things like am i
> >>> seeing connection errors from all stmgrs or if only a few of them are
> >>> having issues. Then it can take the decisions better.
> >>>
> >>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang 
> wrote:
> >>>
>  Hi, heron devs~
> 
>  I think the current stream manager's quitting logic on connection
> >>> failures
>  is problematic. We saw a few internal cases in Twitter that this logic
>  could cause extra issue.
> 
>  Here is a doc with more details:
> 
>  https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
>  sAbzBqCu4-iDUM/edit#
> 
>  Comments and feedbacks are welcome!
> 
>  Thanks.
>  --ning
> 
> >>>
> >>
>
>


Re: About stream manager's quitting logic on connection failures

2018-02-05 Thread Karthik Ramasamy
Ning - let us get this rolled out soon.

Cheers
/karthik

> On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarni  wrote:
> 
> This sounds good to me!
> 
> On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang  wrote:
> 
>> Yeah. That is an option too. In fact it was my first try:
>> https://github.com/twitter/heron/pull/2693 (just an initiative, not
>> completed, a count map should be used instead of a single total count)
>> 
>> In most cases, I think both solutions should have the same result. A few
>> reasons I changed to a tmaster check:
>> - with tmaster, there is only one source of truth and tmaster is more
>> critical anyway. If the tmaster link is not healthy, stmgrs won't work
>> correctly: topology may have created replacement nodes but the disconnected
>> nodes could keep going by themselves.
>> - it is more straightforward. The logic is the same as the current one. One
>> the other side, if we use an array for all remote stmgrs, we could have a
>> smarter logic (which is good) but it could make stmgrs more complicated and
>> less straightforward (bad). I left the stmgr counters there so if in future
>> we decide to add this feature, it should be easy to add. There is a gap
>> between "errors from all" and "errors from a few" and this is not a
>> simple/quick question.
>> 
>> 
>> 
>> 
>> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni 
>> wrote:
>> 
>>> I could't add comments to the document, thus am posting my comments to
>> the
>>> mailing list
>>> One more approach could be to do the current measurement as it is, but
>>> instead of leaving the quitting decision to the stmgtclient, have
>>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects
>>> connection issues, inform that to stmgrclientmgr which keeps a map of
>>> peerstmgrid to error count. Thus it is able to decide things like am i
>>> seeing connection errors from all stmgrs or if only a few of them are
>>> having issues. Then it can take the decisions better.
>>> 
>>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang  wrote:
>>> 
 Hi, heron devs~
 
 I think the current stream manager's quitting logic on connection
>>> failures
 is problematic. We saw a few internal cases in Twitter that this logic
 could cause extra issue.
 
 Here is a doc with more details:
 
 https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
 sAbzBqCu4-iDUM/edit#
 
 Comments and feedbacks are welcome!
 
 Thanks.
 --ning
 
>>> 
>> 



Re: About stream manager's quitting logic on connection failures

2018-02-05 Thread Sanjeev Kulkarni
This sounds good to me!

On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang  wrote:

> Yeah. That is an option too. In fact it was my first try:
> https://github.com/twitter/heron/pull/2693 (just an initiative, not
> completed, a count map should be used instead of a single total count)
>
> In most cases, I think both solutions should have the same result. A few
> reasons I changed to a tmaster check:
> - with tmaster, there is only one source of truth and tmaster is more
> critical anyway. If the tmaster link is not healthy, stmgrs won't work
> correctly: topology may have created replacement nodes but the disconnected
> nodes could keep going by themselves.
> - it is more straightforward. The logic is the same as the current one. One
> the other side, if we use an array for all remote stmgrs, we could have a
> smarter logic (which is good) but it could make stmgrs more complicated and
> less straightforward (bad). I left the stmgr counters there so if in future
> we decide to add this feature, it should be easy to add. There is a gap
> between "errors from all" and "errors from a few" and this is not a
> simple/quick question.
>
>
>
>
> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni 
> wrote:
>
> > I could't add comments to the document, thus am posting my comments to
> the
> > mailing list
> > One more approach could be to do the current measurement as it is, but
> > instead of leaving the quitting decision to the stmgtclient, have
> > stmgrclientmgr do the decision. Thus everytime a stmgr client detects
> > connection issues, inform that to stmgrclientmgr which keeps a map of
> > peerstmgrid to error count. Thus it is able to decide things like am i
> > seeing connection errors from all stmgrs or if only a few of them are
> > having issues. Then it can take the decisions better.
> >
> > On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang  wrote:
> >
> > > Hi, heron devs~
> > >
> > > I think the current stream manager's quitting logic on connection
> > failures
> > > is problematic. We saw a few internal cases in Twitter that this logic
> > > could cause extra issue.
> > >
> > > Here is a doc with more details:
> > >
> > > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
> > > sAbzBqCu4-iDUM/edit#
> > >
> > > Comments and feedbacks are welcome!
> > >
> > > Thanks.
> > > --ning
> > >
> >
>


Re: About stream manager's quitting logic on connection failures

2018-02-05 Thread Ning Wang
Yeah. That is an option too. In fact it was my first try:
https://github.com/twitter/heron/pull/2693 (just an initiative, not
completed, a count map should be used instead of a single total count)

In most cases, I think both solutions should have the same result. A few
reasons I changed to a tmaster check:
- with tmaster, there is only one source of truth and tmaster is more
critical anyway. If the tmaster link is not healthy, stmgrs won't work
correctly: topology may have created replacement nodes but the disconnected
nodes could keep going by themselves.
- it is more straightforward. The logic is the same as the current one. One
the other side, if we use an array for all remote stmgrs, we could have a
smarter logic (which is good) but it could make stmgrs more complicated and
less straightforward (bad). I left the stmgr counters there so if in future
we decide to add this feature, it should be easy to add. There is a gap
between "errors from all" and "errors from a few" and this is not a
simple/quick question.




On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni 
wrote:

> I could't add comments to the document, thus am posting my comments to the
> mailing list
> One more approach could be to do the current measurement as it is, but
> instead of leaving the quitting decision to the stmgtclient, have
> stmgrclientmgr do the decision. Thus everytime a stmgr client detects
> connection issues, inform that to stmgrclientmgr which keeps a map of
> peerstmgrid to error count. Thus it is able to decide things like am i
> seeing connection errors from all stmgrs or if only a few of them are
> having issues. Then it can take the decisions better.
>
> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang  wrote:
>
> > Hi, heron devs~
> >
> > I think the current stream manager's quitting logic on connection
> failures
> > is problematic. We saw a few internal cases in Twitter that this logic
> > could cause extra issue.
> >
> > Here is a doc with more details:
> >
> > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
> > sAbzBqCu4-iDUM/edit#
> >
> > Comments and feedbacks are welcome!
> >
> > Thanks.
> > --ning
> >
>