Podling Report Reminder - February 2018
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 21 February 2018, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, February 07). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. * How does the podling rate their own maturity. This should be appended to the Incubator Wiki page at: https://wiki.apache.org/incubator/February2018 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
Re: About stream manager's quitting logic on connection failures
PR is here: https://github.com/twitter/heron/pull/2711 It should be quite simple, most changes are in config files. On Mon, Feb 5, 2018 at 1:40 PM, Ning Wangwrote: > Cool. Thanks! > > On Mon, Feb 5, 2018 at 11:01 AM, Karthik Ramasamy > wrote: > >> Ning - let us get this rolled out soon. >> >> Cheers >> /karthik >> >> > On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarni >> wrote: >> > >> > This sounds good to me! >> > >> > On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang wrote: >> > >> >> Yeah. That is an option too. In fact it was my first try: >> >> https://github.com/twitter/heron/pull/2693 (just an initiative, not >> >> completed, a count map should be used instead of a single total count) >> >> >> >> In most cases, I think both solutions should have the same result. A >> few >> >> reasons I changed to a tmaster check: >> >> - with tmaster, there is only one source of truth and tmaster is more >> >> critical anyway. If the tmaster link is not healthy, stmgrs won't work >> >> correctly: topology may have created replacement nodes but the >> disconnected >> >> nodes could keep going by themselves. >> >> - it is more straightforward. The logic is the same as the current >> one. One >> >> the other side, if we use an array for all remote stmgrs, we could >> have a >> >> smarter logic (which is good) but it could make stmgrs more >> complicated and >> >> less straightforward (bad). I left the stmgr counters there so if in >> future >> >> we decide to add this feature, it should be easy to add. There is a gap >> >> between "errors from all" and "errors from a few" and this is not a >> >> simple/quick question. >> >> >> >> >> >> >> >> >> >> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni >> >> wrote: >> >> >> >>> I could't add comments to the document, thus am posting my comments to >> >> the >> >>> mailing list >> >>> One more approach could be to do the current measurement as it is, but >> >>> instead of leaving the quitting decision to the stmgtclient, have >> >>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects >> >>> connection issues, inform that to stmgrclientmgr which keeps a map of >> >>> peerstmgrid to error count. Thus it is able to decide things like am i >> >>> seeing connection errors from all stmgrs or if only a few of them are >> >>> having issues. Then it can take the decisions better. >> >>> >> >>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang >> wrote: >> >>> >> Hi, heron devs~ >> >> I think the current stream manager's quitting logic on connection >> >>> failures >> is problematic. We saw a few internal cases in Twitter that this >> logic >> could cause extra issue. >> >> Here is a doc with more details: >> >> https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9 >> sAbzBqCu4-iDUM/edit# >> >> Comments and feedbacks are welcome! >> >> Thanks. >> --ning >> >> >>> >> >> >> >> >
Re: About stream manager's quitting logic on connection failures
Cool. Thanks! On Mon, Feb 5, 2018 at 11:01 AM, Karthik Ramasamywrote: > Ning - let us get this rolled out soon. > > Cheers > /karthik > > > On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarni > wrote: > > > > This sounds good to me! > > > > On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang wrote: > > > >> Yeah. That is an option too. In fact it was my first try: > >> https://github.com/twitter/heron/pull/2693 (just an initiative, not > >> completed, a count map should be used instead of a single total count) > >> > >> In most cases, I think both solutions should have the same result. A few > >> reasons I changed to a tmaster check: > >> - with tmaster, there is only one source of truth and tmaster is more > >> critical anyway. If the tmaster link is not healthy, stmgrs won't work > >> correctly: topology may have created replacement nodes but the > disconnected > >> nodes could keep going by themselves. > >> - it is more straightforward. The logic is the same as the current one. > One > >> the other side, if we use an array for all remote stmgrs, we could have > a > >> smarter logic (which is good) but it could make stmgrs more complicated > and > >> less straightforward (bad). I left the stmgr counters there so if in > future > >> we decide to add this feature, it should be easy to add. There is a gap > >> between "errors from all" and "errors from a few" and this is not a > >> simple/quick question. > >> > >> > >> > >> > >> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni > >> wrote: > >> > >>> I could't add comments to the document, thus am posting my comments to > >> the > >>> mailing list > >>> One more approach could be to do the current measurement as it is, but > >>> instead of leaving the quitting decision to the stmgtclient, have > >>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects > >>> connection issues, inform that to stmgrclientmgr which keeps a map of > >>> peerstmgrid to error count. Thus it is able to decide things like am i > >>> seeing connection errors from all stmgrs or if only a few of them are > >>> having issues. Then it can take the decisions better. > >>> > >>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang > wrote: > >>> > Hi, heron devs~ > > I think the current stream manager's quitting logic on connection > >>> failures > is problematic. We saw a few internal cases in Twitter that this logic > could cause extra issue. > > Here is a doc with more details: > > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9 > sAbzBqCu4-iDUM/edit# > > Comments and feedbacks are welcome! > > Thanks. > --ning > > >>> > >> > >
Re: About stream manager's quitting logic on connection failures
Ning - let us get this rolled out soon. Cheers /karthik > On Feb 5, 2018, at 10:57 AM, Sanjeev Kulkarniwrote: > > This sounds good to me! > > On Mon, Feb 5, 2018 at 1:08 AM, Ning Wang wrote: > >> Yeah. That is an option too. In fact it was my first try: >> https://github.com/twitter/heron/pull/2693 (just an initiative, not >> completed, a count map should be used instead of a single total count) >> >> In most cases, I think both solutions should have the same result. A few >> reasons I changed to a tmaster check: >> - with tmaster, there is only one source of truth and tmaster is more >> critical anyway. If the tmaster link is not healthy, stmgrs won't work >> correctly: topology may have created replacement nodes but the disconnected >> nodes could keep going by themselves. >> - it is more straightforward. The logic is the same as the current one. One >> the other side, if we use an array for all remote stmgrs, we could have a >> smarter logic (which is good) but it could make stmgrs more complicated and >> less straightforward (bad). I left the stmgr counters there so if in future >> we decide to add this feature, it should be easy to add. There is a gap >> between "errors from all" and "errors from a few" and this is not a >> simple/quick question. >> >> >> >> >> On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni >> wrote: >> >>> I could't add comments to the document, thus am posting my comments to >> the >>> mailing list >>> One more approach could be to do the current measurement as it is, but >>> instead of leaving the quitting decision to the stmgtclient, have >>> stmgrclientmgr do the decision. Thus everytime a stmgr client detects >>> connection issues, inform that to stmgrclientmgr which keeps a map of >>> peerstmgrid to error count. Thus it is able to decide things like am i >>> seeing connection errors from all stmgrs or if only a few of them are >>> having issues. Then it can take the decisions better. >>> >>> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang wrote: >>> Hi, heron devs~ I think the current stream manager's quitting logic on connection >>> failures is problematic. We saw a few internal cases in Twitter that this logic could cause extra issue. Here is a doc with more details: https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9 sAbzBqCu4-iDUM/edit# Comments and feedbacks are welcome! Thanks. --ning >>> >>
Re: About stream manager's quitting logic on connection failures
This sounds good to me! On Mon, Feb 5, 2018 at 1:08 AM, Ning Wangwrote: > Yeah. That is an option too. In fact it was my first try: > https://github.com/twitter/heron/pull/2693 (just an initiative, not > completed, a count map should be used instead of a single total count) > > In most cases, I think both solutions should have the same result. A few > reasons I changed to a tmaster check: > - with tmaster, there is only one source of truth and tmaster is more > critical anyway. If the tmaster link is not healthy, stmgrs won't work > correctly: topology may have created replacement nodes but the disconnected > nodes could keep going by themselves. > - it is more straightforward. The logic is the same as the current one. One > the other side, if we use an array for all remote stmgrs, we could have a > smarter logic (which is good) but it could make stmgrs more complicated and > less straightforward (bad). I left the stmgr counters there so if in future > we decide to add this feature, it should be easy to add. There is a gap > between "errors from all" and "errors from a few" and this is not a > simple/quick question. > > > > > On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni > wrote: > > > I could't add comments to the document, thus am posting my comments to > the > > mailing list > > One more approach could be to do the current measurement as it is, but > > instead of leaving the quitting decision to the stmgtclient, have > > stmgrclientmgr do the decision. Thus everytime a stmgr client detects > > connection issues, inform that to stmgrclientmgr which keeps a map of > > peerstmgrid to error count. Thus it is able to decide things like am i > > seeing connection errors from all stmgrs or if only a few of them are > > having issues. Then it can take the decisions better. > > > > On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang wrote: > > > > > Hi, heron devs~ > > > > > > I think the current stream manager's quitting logic on connection > > failures > > > is problematic. We saw a few internal cases in Twitter that this logic > > > could cause extra issue. > > > > > > Here is a doc with more details: > > > > > > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9 > > > sAbzBqCu4-iDUM/edit# > > > > > > Comments and feedbacks are welcome! > > > > > > Thanks. > > > --ning > > > > > >
Re: About stream manager's quitting logic on connection failures
Yeah. That is an option too. In fact it was my first try: https://github.com/twitter/heron/pull/2693 (just an initiative, not completed, a count map should be used instead of a single total count) In most cases, I think both solutions should have the same result. A few reasons I changed to a tmaster check: - with tmaster, there is only one source of truth and tmaster is more critical anyway. If the tmaster link is not healthy, stmgrs won't work correctly: topology may have created replacement nodes but the disconnected nodes could keep going by themselves. - it is more straightforward. The logic is the same as the current one. One the other side, if we use an array for all remote stmgrs, we could have a smarter logic (which is good) but it could make stmgrs more complicated and less straightforward (bad). I left the stmgr counters there so if in future we decide to add this feature, it should be easy to add. There is a gap between "errors from all" and "errors from a few" and this is not a simple/quick question. On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarniwrote: > I could't add comments to the document, thus am posting my comments to the > mailing list > One more approach could be to do the current measurement as it is, but > instead of leaving the quitting decision to the stmgtclient, have > stmgrclientmgr do the decision. Thus everytime a stmgr client detects > connection issues, inform that to stmgrclientmgr which keeps a map of > peerstmgrid to error count. Thus it is able to decide things like am i > seeing connection errors from all stmgrs or if only a few of them are > having issues. Then it can take the decisions better. > > On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang wrote: > > > Hi, heron devs~ > > > > I think the current stream manager's quitting logic on connection > failures > > is problematic. We saw a few internal cases in Twitter that this logic > > could cause extra issue. > > > > Here is a doc with more details: > > > > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9 > > sAbzBqCu4-iDUM/edit# > > > > Comments and feedbacks are welcome! > > > > Thanks. > > --ning > > >