Any way for users to help "stuck" JIRAs with pull requests for Spark 2.3 / future releases?
Hi all, I was wondering with the approach of Spark 2.3 if there's any way us "regular" users can help advance any of JIRAs that could have made it into Spark 2.3 but are likely to miss now as the pull requests are awaiting detailed review. For example: https://issues.apache.org/jira/browse/SPARK-4502 - Spark SQL reads unneccesary nested fields from Parquet Has a pull request from January 2017 with significant performance benefits for parquet reads. https://issues.apache.org/jira/browse/SPARK-21657 - Spark has exponential time complexity to explode(array of structs) Probably affects fewer users, but will be a real help for those users. Both of these example tickets probably need more testing, but without them getting merged into the master branch and included in a release with a default config setting disabling them, the testing will be pretty limited. Is there anything us users can do to help out with these kind of tickets, or do they need to wait for some additional core developer time to free up (I know that's in huge demand everywhere in the project!). Thanks, Ewan This email and any attachments to it may contain confidential information and are intended solely for the addressee. If you are not the intended recipient of this email or if you believe you have received this email in error, please contact the sender and remove it from your system.Do not use, copy or disclose the information contained in this email or in any attachment. RealityMine Limited may monitor email traffic data including the content of email for the purposes of security. RealityMine Limited is a company registered in England and Wales. Registered number: 07920936 Registered office: Warren Bruce Court, Warren Bruce Road, Trafford Park, Manchester M17 1LB
Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)
Github already links to CONTRIBUTING.md. -- of course, a lot of people ignore that. One thing we can do is to add an explicit link to the wiki contributing page in the template (but note that even that introduces some overhead for every pull request). Aside from that, I am not sure if the other suggestions in the JIRA ticket are necessary. For example, the issue with creating a pull request from one branch to another is a problem, but it happens perhaps less than once a week and is trivially closeable. Adding an explicit warning there will fix some cases, but won't entirely eliminate the problem (because I'm sure a lot of people still don't read the template), and will introduce another overhead for everybody who submits the proper way. On Sun, Oct 9, 2016 at 10:14 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Should we just link to > > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > > > > > On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" <gurwls...@gmail.com > > wrote: > > Thanks for confirming this, Sean. I filed this in > https://issues.apache.org/jira/browse/SPARK-17840 > > I would appreciate if anyone who has a better writing skills better than > me tries to fix this. > > I don't want to let reviewers make an effort to correct the grammar. > > > On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote: > >> Yes, it's really CONTRIBUTING.md that's more relevant, because github >> displays a link to it when opening pull requests. https://github.com/a >> pache/spark/blob/master/CONTRIBUTING.md There is also the pull request >> template: https://github.com/apache/spark/blob/master/.githu >> b/PULL_REQUEST_TEMPLATE >> >> I wouldn't want to duplicate info too much, but more pointers to a single >> source of information seems OK. Although I don't know if it will help much, >> sure, pointers from README.md are OK. >> >> On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Hi all, >>> >>> >>> I just noticed the README.md (https://github.com/apache/spark) does not >>> describe the steps or links to follow for creating a PR or JIRA directly. I >>> know probably it is sensible to search google about the contribution guides >>> first before trying to make a PR/JIRA but I think it seems not enough when >>> I see some inappropriate PRs/JIRAs time to time. >>> >>> I guess flooding JIRAs and PRs is problematic (assuming from the emails >>> in dev mailing list) and I think we should explicitly mention and describe >>> this in the README.md and pull request template[1]. >>> >>> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true >>> that we still have some PRs or JIRAs not following the documentation.) >>> >>> So, my suggestions are as below: >>> >>> - Create a section maybe "Contributing To Apache Spark" describing the >>> Wiki and CONTRIBUTING.md[2] in the README.md. >>> >>> - Describe an explicit warning in pull request template[1], for >>> example, "Please double check if your pull request is from a branch to a >>> branch. In most cases, this change is not appropriate. Please ask to >>> mailing list (http://spark.apache.org/community.html) if you are not >>> sure." >>> >>> [1]https://github.com/apache/spark/blob/master/.github/PULL_ >>> REQUEST_TEMPLATE >>> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md >>> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage >>> >>> >>> Thank you all. >>> >>
Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)
Actually let's move the discussion to the JIRA ticket, given there is a ticket. On Sun, Oct 9, 2016 at 5:36 PM, Reynold Xin <r...@databricks.com> wrote: > Github already links to CONTRIBUTING.md. -- of course, a lot of people > ignore that. One thing we can do is to add an explicit link to the wiki > contributing page in the template (but note that even that introduces some > overhead for every pull request). > > Aside from that, I am not sure if the other suggestions in the JIRA ticket > are necessary. For example, the issue with creating a pull request from one > branch to another is a problem, but it happens perhaps less than once a > week and is trivially closeable. Adding an explicit warning there will fix > some cases, but won't entirely eliminate the problem (because I'm sure a > lot of people still don't read the template), and will introduce another > overhead for everybody who submits the proper way. > > > On Sun, Oct 9, 2016 at 10:14 AM, Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> Should we just link to >> >> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark >> >> >> >> >> On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" < >> gurwls...@gmail.com> wrote: >> >> Thanks for confirming this, Sean. I filed this in >> https://issues.apache.org/jira/browse/SPARK-17840 >> >> I would appreciate if anyone who has a better writing skills better than >> me tries to fix this. >> >> I don't want to let reviewers make an effort to correct the grammar. >> >> >> On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote: >> >>> Yes, it's really CONTRIBUTING.md that's more relevant, because github >>> displays a link to it when opening pull requests. https://github.com/a >>> pache/spark/blob/master/CONTRIBUTING.md There is also the pull request >>> template: https://github.com/apache/spark/blob/master/.githu >>> b/PULL_REQUEST_TEMPLATE >>> >>> I wouldn't want to duplicate info too much, but more pointers to a >>> single source of information seems OK. Although I don't know if it will >>> help much, sure, pointers from README.md are OK. >>> >>> On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> >>>> I just noticed the README.md (https://github.com/apache/spark) does >>>> not describe the steps or links to follow for creating a PR or JIRA >>>> directly. I know probably it is sensible to search google about the >>>> contribution guides first before trying to make a PR/JIRA but I think it >>>> seems not enough when I see some inappropriate PRs/JIRAs time to time. >>>> >>>> I guess flooding JIRAs and PRs is problematic (assuming from the >>>> emails in dev mailing list) and I think we should explicitly mention and >>>> describe this in the README.md and pull request template[1]. >>>> >>>> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true >>>> that we still have some PRs or JIRAs not following the documentation.) >>>> >>>> So, my suggestions are as below: >>>> >>>> - Create a section maybe "Contributing To Apache Spark" describing the >>>> Wiki and CONTRIBUTING.md[2] in the README.md. >>>> >>>> - Describe an explicit warning in pull request template[1], for >>>> example, "Please double check if your pull request is from a branch to a >>>> branch. In most cases, this change is not appropriate. Please ask to >>>> mailing list (http://spark.apache.org/community.html) if you are not >>>> sure." >>>> >>>> [1]https://github.com/apache/spark/blob/master/.github/PULL_ >>>> REQUEST_TEMPLATE >>>> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md >>>> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage >>>> >>>> >>>> Thank you all. >>>> >>> >
Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)
Should we just link to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote: Thanks for confirming this, Sean. I filed this in https://issues.apache.org/jira/browse/SPARK-17840 I would appreciate if anyone who has a better writing skills better than me tries to fix this. I don't want to let reviewers make an effort to correct the grammar. On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com<mailto:so...@cloudera.com>> wrote: Yes, it's really CONTRIBUTING.md that's more relevant, because github displays a link to it when opening pull requests. https://github.com/apache/spark/blob/master/CONTRIBUTING.md There is also the pull request template: https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE I wouldn't want to duplicate info too much, but more pointers to a single source of information seems OK. Although I don't know if it will help much, sure, pointers from README.md are OK. On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote: Hi all, I just noticed the README.md (https://github.com/apache/spark) does not describe the steps or links to follow for creating a PR or JIRA directly. I know probably it is sensible to search google about the contribution guides first before trying to make a PR/JIRA but I think it seems not enough when I see some inappropriate PRs/JIRAs time to time. I guess flooding JIRAs and PRs is problematic (assuming from the emails in dev mailing list) and I think we should explicitly mention and describe this in the README.md and pull request template[1]. (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true that we still have some PRs or JIRAs not following the documentation.) So, my suggestions are as below: - Create a section maybe "Contributing To Apache Spark" describing the Wiki and CONTRIBUTING.md[2] in the README.md. - Describe an explicit warning in pull request template[1], for example, "Please double check if your pull request is from a branch to a branch. In most cases, this change is not appropriate. Please ask to mailing list (http://spark.apache.org/community.html) if you are not sure." [1]https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage Thank you all.
Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)
Thanks for confirming this, Sean. I filed this in https://issues.apache.org/jira/browse/SPARK-17840 I would appreciate if anyone who has a better writing skills better than me tries to fix this. I don't want to let reviewers make an effort to correct the grammar. On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote: > Yes, it's really CONTRIBUTING.md that's more relevant, because github > displays a link to it when opening pull requests. https://github.com/a > pache/spark/blob/master/CONTRIBUTING.md There is also the pull request > template: https://github.com/apache/spark/blob/master/.githu > b/PULL_REQUEST_TEMPLATE > > I wouldn't want to duplicate info too much, but more pointers to a single > source of information seems OK. Although I don't know if it will help much, > sure, pointers from README.md are OK. > > On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> Hi all, >> >> >> I just noticed the README.md (https://github.com/apache/spark) does not >> describe the steps or links to follow for creating a PR or JIRA directly. I >> know probably it is sensible to search google about the contribution guides >> first before trying to make a PR/JIRA but I think it seems not enough when >> I see some inappropriate PRs/JIRAs time to time. >> >> I guess flooding JIRAs and PRs is problematic (assuming from the emails >> in dev mailing list) and I think we should explicitly mention and describe >> this in the README.md and pull request template[1]. >> >> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true >> that we still have some PRs or JIRAs not following the documentation.) >> >> So, my suggestions are as below: >> >> - Create a section maybe "Contributing To Apache Spark" describing the >> Wiki and CONTRIBUTING.md[2] in the README.md. >> >> - Describe an explicit warning in pull request template[1], for example, >> "Please double check if your pull request is from a branch to a branch. In >> most cases, this change is not appropriate. Please ask to mailing list ( >> http://spark.apache.org/community.html) if you are not sure." >> >> [1]https://github.com/apache/spark/blob/master/.github/PULL_ >> REQUEST_TEMPLATE >> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md >> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage >> >> >> Thank you all. >> >
Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)
Hi all, I just noticed the README.md (https://github.com/apache/spark) does not describe the steps or links to follow for creating a PR or JIRA directly. I know probably it is sensible to search google about the contribution guides first before trying to make a PR/JIRA but I think it seems not enough when I see some inappropriate PRs/JIRAs time to time. I guess flooding JIRAs and PRs is problematic (assuming from the emails in dev mailing list) and I think we should explicitly mention and describe this in the README.md and pull request template[1]. (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true that we still have some PRs or JIRAs not following the documentation.) So, my suggestions are as below: - Create a section maybe "Contributing To Apache Spark" describing the Wiki and CONTRIBUTING.md[2] in the README.md. - Describe an explicit warning in pull request template[1], for example, "Please double check if your pull request is from a branch to a branch. In most cases, this change is not appropriate. Please ask to mailing list ( http://spark.apache.org/community.html) if you are not sure." [1]https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage Thank you all.
Re: auto closing pull requests that have been inactive > 30 days?
Thanks a lot for commenting. We are getting great feedback on this thread. The take-aways are: 1. In general people prefer having explicit reasons why pull requests should be closed. We should push committers to leave messages that are more explicit about why certain PR should be closed or not. I can't agree more. But this is not mutually exclusive. 2. It is difficult to deal with the scale we are talking about. There is not a single measure that could "fix" everything. Spark is as far as I know one of the most active open source projects in terms of contributions, in part because we have made it very easy to accept contributions. There have been very few open source projects that needed to deal with this scale. Actually if you look at all the historic PRs, we closed 12k and have ~450 open. That's less than 4% of the prs outstanding -- not a bad number. The actual ratio is likely even lower because many of the 450 open will be merged in the future. I also took a look at some of the most popular projects on github (e.g. jquery, angular, react) -- they either have far fewer merged pull requests or a higher ratio of open-to-close. So we are actually doing pretty well. But of course there is always room for improvement. On Mon, Apr 18, 2016 at 8:46 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Relevant: https://github.com/databricks/spark-pr-dashboard/issues/1 > > A lot of this was discussed a while back when the PR Dashboard was first > introduced, and several times before and after that as well. (e.g. August > 2014 > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-td8015.html> > ) > > If there is not enough momentum to build the tooling that people are > discussing here, then perhaps Reynold's suggestion is the most practical > one that is likely to see the light of day. > > I think asking committers to be more active in commenting on PRs is > theoretically the correct thing to do, but impractical. I'm not a > committer, but I would guess that most of them are already way > overcommitted (ha!) and asking them to do more just won't yield results. > > We've had several instances in the past where we all tried to rally > <https://mail-archives.apache.org/mod_mbox/spark-dev/201412.mbox/%3ccaohmdzer4cg_wxgktoxsg8s34krqezygjfzdoymgu9vhyjb...@mail.gmail.com%3E> > and be more proactive about giving feedback, closing PRs, and nudging > contributors who have gone silent. My observation is that the level of > energy required to "properly" curate PR activity in that way is simply not > sustainable. People can do it for a few weeks and then things revert to the > way they are now. > > Perhaps the missing link that would make this sustainable is better > tooling. If you think so and can sling some Javascript, you might want to > contribute to the PR Dashboard <https://spark-prs.appspot.com/>. > > Perhaps the missing link is something else: A different PR review process; > more committers; a higher barrier to contributing; a combination thereof; > etc... > > Also relevant: http://danluu.com/discourage-oss/ > > By the way, some people noted that closing PRs may discourage > contributors. I think our open PR count alone is very discouraging. Under > what circumstances would you feel encouraged to open a PR against a project > that has hundreds of open PRs, some from many, many months ago > <https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-asc> > ? > > Nick > > > 2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성: > >> During the months of November / December, the 30 day period should be >> relaxed. >> >> Some people(at least in US) may take extended vacation during that time. >> >> For Chinese developers, Spring Festival would bear similar circumstance. >> >> On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com> >> wrote: >> >>> I also think this might not have to be closed only because it is >>> inactive. >>> >>> >>> How about closing issues after 30 days when a committer's comment is >>> added at the last without responses from the author? >>> >>> >>> IMHO, If the committers are not sure whether the patch would be useful, >>> then I think they should leave some comments why they are not sure, not >>> just ignoring. >>> >>> Or, simply they could ask the author to prove that the patch is useful >>> or safe with some references and tests. >>> >>> >>> I think it might be nicer than that users are supposed to keep pinging. >>> **Personally**, apparently, I am sometimes a bit worried if pinging >&g
Re: auto closing pull requests that have been inactive > 30 days?
t the last without responses from the author? >>>> >>>> >>>> IMHO, If the committers are not sure whether the patch would be >>>> useful, then I think they should leave some comments why they are not sure, >>>> not just ignoring. >>>> >>>> Or, simply they could ask the author to prove that the patch is useful >>>> or safe with some references and tests. >>>> >>>> >>>> I think it might be nicer than that users are supposed to keep pinging. >>>> **Personally**, apparently, I am sometimes a bit worried if pinging >>>> multiple times can be a bit annoying. >>>> >>>> >>>> >>>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>> >>>>> It would be better to have a specific technical reason why this PR >>>>> should be closed, either the implementation is not good or the problem is >>>>> not valid, or something else. That will actually help the contributor to >>>>> shape their codes and reopen the PR again. Otherwise reasons like "feel >>>>> free to reopen for so-and-so reason" is actually discouraging and no >>>>> difference than directly close the PR. >>>>> >>>>> Just my two cents. >>>>> >>>>> Thanks >>>>> Jerry >>>>> >>>>> >>>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Having a PR closed, especially if due to committers not having hte >>>>>> bandwidth to check on things, will be very discouraging to new folks. >>>>>> Doubly so for those inexperienced with opensource. Even if the message >>>>>> says "feel free to reopen for so-and-so reason", new folks who lack >>>>>> confidence are going to see reopening as "pestering" and busy folks >>>>>> are going to see it as a clear indication that their work is not even >>>>>> valuable enough for a human to give a reason for closing. In either >>>>>> case, the cost of reopening is substantially higher than that button >>>>>> press. >>>>>> >>>>>> How about we start by keeping a report of "at-risk" PRs that have been >>>>>> stale for 30 days to make it easier for committers to look at the prs >>>>>> that have been long inactive? >>>>>> >>>>>> >>>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> > The cost of "reopen" is close to zero, because it is just clicking >>>>>> a button. >>>>>> > I think you were referring to the cost of closing the pull request, >>>>>> and you >>>>>> > are assuming people look at the pull requests that have been >>>>>> inactive for a >>>>>> > long time. That seems equally likely (or unlikely) as committers >>>>>> looking at >>>>>> > the recently closed pull requests. >>>>>> > >>>>>> > In either case, most pull requests are scanned through by us when >>>>>> they are >>>>>> > first open, and if they are important enough, usually they get >>>>>> merged >>>>>> > quickly or a target version is set in JIRA. We can definitely >>>>>> improve that >>>>>> > by making it more explicit. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> >>>>>> wrote: >>>>>> >> >>>>>> >> From committers' perspective, would they look at closed PRs ? >>>>>> >> >>>>>> >> If not, the cost is not close to zero. >>>>>> >> Meaning, some potentially useful PRs would never see the light of >>>>>> day. >>>>>> >> >>>>>> >> My two cents. >>>>>> >> >>>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> >>> >>>>>> >>> Part of it is how d
Re: auto closing pull requests that have been inactive > 30 days?
;>>> shape their codes and reopen the PR again. Otherwise reasons like "feel >>>> free to reopen for so-and-so reason" is actually discouraging and no >>>> difference than directly close the PR. >>>> >>>> Just my two cents. >>>> >>>> Thanks >>>> Jerry >>>> >>>> >>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> >>>> wrote: >>>> >>>>> Having a PR closed, especially if due to committers not having hte >>>>> bandwidth to check on things, will be very discouraging to new folks. >>>>> Doubly so for those inexperienced with opensource. Even if the message >>>>> says "feel free to reopen for so-and-so reason", new folks who lack >>>>> confidence are going to see reopening as "pestering" and busy folks >>>>> are going to see it as a clear indication that their work is not even >>>>> valuable enough for a human to give a reason for closing. In either >>>>> case, the cost of reopening is substantially higher than that button >>>>> press. >>>>> >>>>> How about we start by keeping a report of "at-risk" PRs that have been >>>>> stale for 30 days to make it easier for committers to look at the prs >>>>> that have been long inactive? >>>>> >>>>> >>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> > The cost of "reopen" is close to zero, because it is just clicking a >>>>> button. >>>>> > I think you were referring to the cost of closing the pull request, >>>>> and you >>>>> > are assuming people look at the pull requests that have been >>>>> inactive for a >>>>> > long time. That seems equally likely (or unlikely) as committers >>>>> looking at >>>>> > the recently closed pull requests. >>>>> > >>>>> > In either case, most pull requests are scanned through by us when >>>>> they are >>>>> > first open, and if they are important enough, usually they get merged >>>>> > quickly or a target version is set in JIRA. We can definitely >>>>> improve that >>>>> > by making it more explicit. >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> From committers' perspective, would they look at closed PRs ? >>>>> >> >>>>> >> If not, the cost is not close to zero. >>>>> >> Meaning, some potentially useful PRs would never see the light of >>>>> day. >>>>> >> >>>>> >> My two cents. >>>>> >> >>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> >>> >>>>> >>> Part of it is how difficult it is to automate this. We can build a >>>>> >>> perfect engine with a lot of rules that understand everything. But >>>>> the more >>>>> >>> complicated rules we need, the more unlikely for any of these to >>>>> happen. So >>>>> >>> I'd rather do this and create a nice enough message to tell >>>>> contributors >>>>> >>> sometimes mistake happen but the cost to reopen is approximately >>>>> zero (i.e. >>>>> >>> click a button on the pull request). >>>>> >>> >>>>> >>> >>>>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> bq. close the ones where they don't respond for a week >>>>> >>>> >>>>> >>>> Does this imply that the script understands response from human ? >>>>> >>>> >>>>> >>>> Meaning, would the script use some regex which signifies that the >>>>> >>>> contributor is willing to close the PR ? >>>>> >>>> >>>>> >>>> If the contributor is willin
Re: auto closing pull requests that have been inactive > 30 days?
gt; >>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>: >>> >>>> It would be better to have a specific technical reason why this PR >>>> should be closed, either the implementation is not good or the problem is >>>> not valid, or something else. That will actually help the contributor to >>>> shape their codes and reopen the PR again. Otherwise reasons like "feel >>>> free to reopen for so-and-so reason" is actually discouraging and no >>>> difference than directly close the PR. >>>> >>>> Just my two cents. >>>> >>>> Thanks >>>> Jerry >>>> >>>> >>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> >>>> wrote: >>>> >>>>> Having a PR closed, especially if due to committers not having hte >>>>> bandwidth to check on things, will be very discouraging to new folks. >>>>> Doubly so for those inexperienced with opensource. Even if the message >>>>> says "feel free to reopen for so-and-so reason", new folks who lack >>>>> confidence are going to see reopening as "pestering" and busy folks >>>>> are going to see it as a clear indication that their work is not even >>>>> valuable enough for a human to give a reason for closing. In either >>>>> case, the cost of reopening is substantially higher than that button >>>>> press. >>>>> >>>>> How about we start by keeping a report of "at-risk" PRs that have been >>>>> stale for 30 days to make it easier for committers to look at the prs >>>>> that have been long inactive? >>>>> >>>>> >>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> > The cost of "reopen" is close to zero, because it is just clicking a >>>>> button. >>>>> > I think you were referring to the cost of closing the pull request, >>>>> and you >>>>> > are assuming people look at the pull requests that have been >>>>> inactive for a >>>>> > long time. That seems equally likely (or unlikely) as committers >>>>> looking at >>>>> > the recently closed pull requests. >>>>> > >>>>> > In either case, most pull requests are scanned through by us when >>>>> they are >>>>> > first open, and if they are important enough, usually they get merged >>>>> > quickly or a target version is set in JIRA. We can definitely >>>>> improve that >>>>> > by making it more explicit. >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> From committers' perspective, would they look at closed PRs ? >>>>> >> >>>>> >> If not, the cost is not close to zero. >>>>> >> Meaning, some potentially useful PRs would never see the light of >>>>> day. >>>>> >> >>>>> >> My two cents. >>>>> >> >>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> >>> >>>>> >>> Part of it is how difficult it is to automate this. We can build a >>>>> >>> perfect engine with a lot of rules that understand everything. But >>>>> the more >>>>> >>> complicated rules we need, the more unlikely for any of these to >>>>> happen. So >>>>> >>> I'd rather do this and create a nice enough message to tell >>>>> contributors >>>>> >>> sometimes mistake happen but the cost to reopen is approximately >>>>> zero (i.e. >>>>> >>> click a button on the pull request). >>>>> >>> >>>>> >>> >>>>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> bq. close the ones where they don't respond for a week >>>>> >>>> >>>>> >>>> Doe
Re: auto closing pull requests that have been inactive > 30 days?
Relevant: https://github.com/databricks/spark-pr-dashboard/issues/1 A lot of this was discussed a while back when the PR Dashboard was first introduced, and several times before and after that as well. (e.g. August 2014 <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-td8015.html> ) If there is not enough momentum to build the tooling that people are discussing here, then perhaps Reynold's suggestion is the most practical one that is likely to see the light of day. I think asking committers to be more active in commenting on PRs is theoretically the correct thing to do, but impractical. I'm not a committer, but I would guess that most of them are already way overcommitted (ha!) and asking them to do more just won't yield results. We've had several instances in the past where we all tried to rally <https://mail-archives.apache.org/mod_mbox/spark-dev/201412.mbox/%3ccaohmdzer4cg_wxgktoxsg8s34krqezygjfzdoymgu9vhyjb...@mail.gmail.com%3E> and be more proactive about giving feedback, closing PRs, and nudging contributors who have gone silent. My observation is that the level of energy required to "properly" curate PR activity in that way is simply not sustainable. People can do it for a few weeks and then things revert to the way they are now. Perhaps the missing link that would make this sustainable is better tooling. If you think so and can sling some Javascript, you might want to contribute to the PR Dashboard <https://spark-prs.appspot.com/>. Perhaps the missing link is something else: A different PR review process; more committers; a higher barrier to contributing; a combination thereof; etc... Also relevant: http://danluu.com/discourage-oss/ By the way, some people noted that closing PRs may discourage contributors. I think our open PR count alone is very discouraging. Under what circumstances would you feel encouraged to open a PR against a project that has hundreds of open PRs, some from many, many months ago <https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-asc> ? Nick 2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성: > During the months of November / December, the 30 day period should be > relaxed. > > Some people(at least in US) may take extended vacation during that time. > > For Chinese developers, Spring Festival would bear similar circumstance. > > On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> I also think this might not have to be closed only because it is >> inactive. >> >> >> How about closing issues after 30 days when a committer's comment is >> added at the last without responses from the author? >> >> >> IMHO, If the committers are not sure whether the patch would be useful, >> then I think they should leave some comments why they are not sure, not >> just ignoring. >> >> Or, simply they could ask the author to prove that the patch is useful >> or safe with some references and tests. >> >> >> I think it might be nicer than that users are supposed to keep pinging. >> **Personally**, apparently, I am sometimes a bit worried if pinging >> multiple times can be a bit annoying. >> >> >> >> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>: >> >>> It would be better to have a specific technical reason why this PR >>> should be closed, either the implementation is not good or the problem is >>> not valid, or something else. That will actually help the contributor to >>> shape their codes and reopen the PR again. Otherwise reasons like "feel >>> free to reopen for so-and-so reason" is actually discouraging and no >>> difference than directly close the PR. >>> >>> Just my two cents. >>> >>> Thanks >>> Jerry >>> >>> >>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> >>> wrote: >>> >>>> Having a PR closed, especially if due to committers not having hte >>>> bandwidth to check on things, will be very discouraging to new folks. >>>> Doubly so for those inexperienced with opensource. Even if the message >>>> says "feel free to reopen for so-and-so reason", new folks who lack >>>> confidence are going to see reopening as "pestering" and busy folks >>>> are going to see it as a clear indication that their work is not even >>>> valuable enough for a human to give a reason for closing. In either >>>> case, the cost of reopening is substantially higher than that button >>>> press. >>>> >>>> How about we start by keeping a report of "
Re: auto closing pull requests that have been inactive > 30 days?
During the months of November / December, the 30 day period should be relaxed. Some people(at least in US) may take extended vacation during that time. For Chinese developers, Spring Festival would bear similar circumstance. On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > I also think this might not have to be closed only because it is inactive. > > > How about closing issues after 30 days when a committer's comment is added > at the last without responses from the author? > > > IMHO, If the committers are not sure whether the patch would be useful, > then I think they should leave some comments why they are not sure, not > just ignoring. > > Or, simply they could ask the author to prove that the patch is useful or > safe with some references and tests. > > > I think it might be nicer than that users are supposed to keep pinging. > **Personally**, apparently, I am sometimes a bit worried if pinging > multiple times can be a bit annoying. > > > > 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>: > >> It would be better to have a specific technical reason why this PR should >> be closed, either the implementation is not good or the problem is not >> valid, or something else. That will actually help the contributor to shape >> their codes and reopen the PR again. Otherwise reasons like "feel free >> to reopen for so-and-so reason" is actually discouraging and no difference >> than directly close the PR. >> >> Just my two cents. >> >> Thanks >> Jerry >> >> >> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote: >> >>> Having a PR closed, especially if due to committers not having hte >>> bandwidth to check on things, will be very discouraging to new folks. >>> Doubly so for those inexperienced with opensource. Even if the message >>> says "feel free to reopen for so-and-so reason", new folks who lack >>> confidence are going to see reopening as "pestering" and busy folks >>> are going to see it as a clear indication that their work is not even >>> valuable enough for a human to give a reason for closing. In either >>> case, the cost of reopening is substantially higher than that button >>> press. >>> >>> How about we start by keeping a report of "at-risk" PRs that have been >>> stale for 30 days to make it easier for committers to look at the prs >>> that have been long inactive? >>> >>> >>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> > The cost of "reopen" is close to zero, because it is just clicking a >>> button. >>> > I think you were referring to the cost of closing the pull request, >>> and you >>> > are assuming people look at the pull requests that have been inactive >>> for a >>> > long time. That seems equally likely (or unlikely) as committers >>> looking at >>> > the recently closed pull requests. >>> > >>> > In either case, most pull requests are scanned through by us when they >>> are >>> > first open, and if they are important enough, usually they get merged >>> > quickly or a target version is set in JIRA. We can definitely improve >>> that >>> > by making it more explicit. >>> > >>> > >>> > >>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >> >>> >> From committers' perspective, would they look at closed PRs ? >>> >> >>> >> If not, the cost is not close to zero. >>> >> Meaning, some potentially useful PRs would never see the light of day. >>> >> >>> >> My two cents. >>> >> >>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>> >>> >>> Part of it is how difficult it is to automate this. We can build a >>> >>> perfect engine with a lot of rules that understand everything. But >>> the more >>> >>> complicated rules we need, the more unlikely for any of these to >>> happen. So >>> >>> I'd rather do this and create a nice enough message to tell >>> contributors >>> >>> sometimes mistake happen but the cost to reopen is approximately >>> zero (i.e. >>> >>> click a button on the pull request). >>> >>>
Re: auto closing pull requests that have been inactive > 30 days?
I also think this might not have to be closed only because it is inactive. How about closing issues after 30 days when a committer's comment is added at the last without responses from the author? IMHO, If the committers are not sure whether the patch would be useful, then I think they should leave some comments why they are not sure, not just ignoring. Or, simply they could ask the author to prove that the patch is useful or safe with some references and tests. I think it might be nicer than that users are supposed to keep pinging. **Personally**, apparently, I am sometimes a bit worried if pinging multiple times can be a bit annoying. 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>: > It would be better to have a specific technical reason why this PR should > be closed, either the implementation is not good or the problem is not > valid, or something else. That will actually help the contributor to shape > their codes and reopen the PR again. Otherwise reasons like "feel free to > reopen for so-and-so reason" is actually discouraging and no difference > than directly close the PR. > > Just my two cents. > > Thanks > Jerry > > > On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote: > >> Having a PR closed, especially if due to committers not having hte >> bandwidth to check on things, will be very discouraging to new folks. >> Doubly so for those inexperienced with opensource. Even if the message >> says "feel free to reopen for so-and-so reason", new folks who lack >> confidence are going to see reopening as "pestering" and busy folks >> are going to see it as a clear indication that their work is not even >> valuable enough for a human to give a reason for closing. In either >> case, the cost of reopening is substantially higher than that button >> press. >> >> How about we start by keeping a report of "at-risk" PRs that have been >> stale for 30 days to make it easier for committers to look at the prs >> that have been long inactive? >> >> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote: >> > The cost of "reopen" is close to zero, because it is just clicking a >> button. >> > I think you were referring to the cost of closing the pull request, and >> you >> > are assuming people look at the pull requests that have been inactive >> for a >> > long time. That seems equally likely (or unlikely) as committers >> looking at >> > the recently closed pull requests. >> > >> > In either case, most pull requests are scanned through by us when they >> are >> > first open, and if they are important enough, usually they get merged >> > quickly or a target version is set in JIRA. We can definitely improve >> that >> > by making it more explicit. >> > >> > >> > >> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> >> >> From committers' perspective, would they look at closed PRs ? >> >> >> >> If not, the cost is not close to zero. >> >> Meaning, some potentially useful PRs would never see the light of day. >> >> >> >> My two cents. >> >> >> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> >> wrote: >> >>> >> >>> Part of it is how difficult it is to automate this. We can build a >> >>> perfect engine with a lot of rules that understand everything. But >> the more >> >>> complicated rules we need, the more unlikely for any of these to >> happen. So >> >>> I'd rather do this and create a nice enough message to tell >> contributors >> >>> sometimes mistake happen but the cost to reopen is approximately zero >> (i.e. >> >>> click a button on the pull request). >> >>> >> >>> >> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>>> >> >>>> bq. close the ones where they don't respond for a week >> >>>> >> >>>> Does this imply that the script understands response from human ? >> >>>> >> >>>> Meaning, would the script use some regex which signifies that the >> >>>> contributor is willing to close the PR ? >> >>>> >> >>>> If the contributor is willing to close, why wouldn't he / she do it >> >>>> him/herself ? >> >>>> >> &
Re: auto closing pull requests that have been inactive > 30 days?
It would be better to have a specific technical reason why this PR should be closed, either the implementation is not good or the problem is not valid, or something else. That will actually help the contributor to shape their codes and reopen the PR again. Otherwise reasons like "feel free to reopen for so-and-so reason" is actually discouraging and no difference than directly close the PR. Just my two cents. Thanks Jerry On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote: > Having a PR closed, especially if due to committers not having hte > bandwidth to check on things, will be very discouraging to new folks. > Doubly so for those inexperienced with opensource. Even if the message > says "feel free to reopen for so-and-so reason", new folks who lack > confidence are going to see reopening as "pestering" and busy folks > are going to see it as a clear indication that their work is not even > valuable enough for a human to give a reason for closing. In either > case, the cost of reopening is substantially higher than that button > press. > > How about we start by keeping a report of "at-risk" PRs that have been > stale for 30 days to make it easier for committers to look at the prs > that have been long inactive? > > On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote: > > The cost of "reopen" is close to zero, because it is just clicking a > button. > > I think you were referring to the cost of closing the pull request, and > you > > are assuming people look at the pull requests that have been inactive > for a > > long time. That seems equally likely (or unlikely) as committers looking > at > > the recently closed pull requests. > > > > In either case, most pull requests are scanned through by us when they > are > > first open, and if they are important enough, usually they get merged > > quickly or a target version is set in JIRA. We can definitely improve > that > > by making it more explicit. > > > > > > > > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >> From committers' perspective, would they look at closed PRs ? > >> > >> If not, the cost is not close to zero. > >> Meaning, some potentially useful PRs would never see the light of day. > >> > >> My two cents. > >> > >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> > wrote: > >>> > >>> Part of it is how difficult it is to automate this. We can build a > >>> perfect engine with a lot of rules that understand everything. But the > more > >>> complicated rules we need, the more unlikely for any of these to > happen. So > >>> I'd rather do this and create a nice enough message to tell > contributors > >>> sometimes mistake happen but the cost to reopen is approximately zero > (i.e. > >>> click a button on the pull request). > >>> > >>> > >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>> > >>>> bq. close the ones where they don't respond for a week > >>>> > >>>> Does this imply that the script understands response from human ? > >>>> > >>>> Meaning, would the script use some regex which signifies that the > >>>> contributor is willing to close the PR ? > >>>> > >>>> If the contributor is willing to close, why wouldn't he / she do it > >>>> him/herself ? > >>>> > >>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> > >>>> wrote: > >>>>> > >>>>> Personally I'd rather err on the side of keeping PRs open, but I > >>>>> understand wanting to keep the open PRs limited to ones which have a > >>>>> reasonable chance of being merged. > >>>>> > >>>>> What about if we filtered for non-mergeable PRs or instead left a > >>>>> comment asking the author to respond if they are still available to > move the > >>>>> PR forward - and close the ones where they don't respond for a week? > >>>>> > >>>>> Just a suggestion. > >>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: > >>>>>> > >>>>>> I had one PR which got merged after 3 months. > >>>>>> > >>>>>> If the inactivity was due to co
Re: auto closing pull requests that have been inactive > 30 days?
Having a PR closed, especially if due to committers not having hte bandwidth to check on things, will be very discouraging to new folks. Doubly so for those inexperienced with opensource. Even if the message says "feel free to reopen for so-and-so reason", new folks who lack confidence are going to see reopening as "pestering" and busy folks are going to see it as a clear indication that their work is not even valuable enough for a human to give a reason for closing. In either case, the cost of reopening is substantially higher than that button press. How about we start by keeping a report of "at-risk" PRs that have been stale for 30 days to make it easier for committers to look at the prs that have been long inactive? On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote: > The cost of "reopen" is close to zero, because it is just clicking a button. > I think you were referring to the cost of closing the pull request, and you > are assuming people look at the pull requests that have been inactive for a > long time. That seems equally likely (or unlikely) as committers looking at > the recently closed pull requests. > > In either case, most pull requests are scanned through by us when they are > first open, and if they are important enough, usually they get merged > quickly or a target version is set in JIRA. We can definitely improve that > by making it more explicit. > > > > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> From committers' perspective, would they look at closed PRs ? >> >> If not, the cost is not close to zero. >> Meaning, some potentially useful PRs would never see the light of day. >> >> My two cents. >> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote: >>> >>> Part of it is how difficult it is to automate this. We can build a >>> perfect engine with a lot of rules that understand everything. But the more >>> complicated rules we need, the more unlikely for any of these to happen. So >>> I'd rather do this and create a nice enough message to tell contributors >>> sometimes mistake happen but the cost to reopen is approximately zero (i.e. >>> click a button on the pull request). >>> >>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>> bq. close the ones where they don't respond for a week >>>> >>>> Does this imply that the script understands response from human ? >>>> >>>> Meaning, would the script use some regex which signifies that the >>>> contributor is willing to close the PR ? >>>> >>>> If the contributor is willing to close, why wouldn't he / she do it >>>> him/herself ? >>>> >>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> >>>> wrote: >>>>> >>>>> Personally I'd rather err on the side of keeping PRs open, but I >>>>> understand wanting to keep the open PRs limited to ones which have a >>>>> reasonable chance of being merged. >>>>> >>>>> What about if we filtered for non-mergeable PRs or instead left a >>>>> comment asking the author to respond if they are still available to move >>>>> the >>>>> PR forward - and close the ones where they don't respond for a week? >>>>> >>>>> Just a suggestion. >>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>> I had one PR which got merged after 3 months. >>>>>> >>>>>> If the inactivity was due to contributor, I think it can be closed >>>>>> after 30 days. >>>>>> But if the inactivity was due to lack of review, the PR should be kept >>>>>> open. >>>>>> >>>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >>>>>> wrote: >>>>>>> >>>>>>> For what it's worth, I have definitely had PRs that sat inactive for >>>>>>> more than 30 days due to committers not having time to look at them, >>>>>>> but did eventually end up successfully being merged. >>>>>>> >>>>>>> I guess if this just ends up being a committer ping and reopening the >>>>>>> PR, it's fine, but I don't know if it really addresses the underlying >>>>>>> issue. >>
Re: auto closing pull requests that have been inactive > 30 days?
The cost of "reopen" is close to zero, because it is just clicking a button. I think you were referring to the cost of closing the pull request, and you are assuming people look at the pull requests that have been inactive for a long time. That seems equally likely (or unlikely) as committers looking at the recently closed pull requests. In either case, most pull requests are scanned through by us when they are first open, and if they are important enough, usually they get merged quickly or a target version is set in JIRA. We can definitely improve that by making it more explicit. On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: > From committers' perspective, would they look at closed PRs ? > > If not, the cost is not close to zero. > Meaning, some potentially useful PRs would never see the light of day. > > My two cents. > > On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote: > >> Part of it is how difficult it is to automate this. We can build a >> perfect engine with a lot of rules that understand everything. But the more >> complicated rules we need, the more unlikely for any of these to happen. So >> I'd rather do this and create a nice enough message to tell contributors >> sometimes mistake happen but the cost to reopen is approximately zero (i.e. >> click a button on the pull request). >> >> >> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> bq. close the ones where they don't respond for a week >>> >>> Does this imply that the script understands response from human ? >>> >>> Meaning, would the script use some regex which signifies that the >>> contributor is willing to close the PR ? >>> >>> If the contributor is willing to close, why wouldn't he / she do it >>> him/herself ? >>> >>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>>> Personally I'd rather err on the side of keeping PRs open, but I >>>> understand wanting to keep the open PRs limited to ones which have a >>>> reasonable chance of being merged. >>>> >>>> What about if we filtered for non-mergeable PRs or instead left a >>>> comment asking the author to respond if they are still available to move >>>> the PR forward - and close the ones where they don't respond for a week? >>>> >>>> Just a suggestion. >>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> I had one PR which got merged after 3 months. >>>>> >>>>> If the inactivity was due to contributor, I think it can be closed >>>>> after 30 days. >>>>> But if the inactivity was due to lack of review, the PR should be kept >>>>> open. >>>>> >>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >>>>> wrote: >>>>> >>>>>> For what it's worth, I have definitely had PRs that sat inactive for >>>>>> more than 30 days due to committers not having time to look at them, >>>>>> but did eventually end up successfully being merged. >>>>>> >>>>>> I guess if this just ends up being a committer ping and reopening the >>>>>> PR, it's fine, but I don't know if it really addresses the underlying >>>>>> issue. >>>>>> >>>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> > We have hit a new high in open pull requests: 469 today. While we >>>>>> can >>>>>> > certainly get more review bandwidth, many of these are old and >>>>>> still open >>>>>> > for other reasons. Some are stale because the original authors have >>>>>> become >>>>>> > busy and inactive, and some others are stale because the committers >>>>>> are not >>>>>> > sure whether the patch would be useful, but have not rejected the >>>>>> patch >>>>>> > explicitly. We can cut down the signal to noise ratio by closing >>>>>> pull >>>>>> > requests that have been inactive for greater than 30 days, with a >>>>>> nice >>>>>> > message. I just checked and this would close ~ half of the pull >>>>>> requests. >>>>>> > >>>>>> > For example: >>>>>> > >>>>>> > "Thank you for creating this pull request. Since this pull request >>>>>> has been >>>>>> > inactive for 30 days, we are automatically closing it. Closing the >>>>>> pull >>>>>> > request does not remove it from history and will retain all the >>>>>> diff and >>>>>> > review comments. If you have the bandwidth and would like to >>>>>> continue >>>>>> > pushing this forward, please reopen it. Thanks again!" >>>>>> > >>>>>> > >>>>>> >>>>>> - >>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> Cell : 425-233-8271 >>>> Twitter: https://twitter.com/holdenkarau >>>> >>>> >>> >> >
Re: auto closing pull requests that have been inactive > 30 days?
>From committers' perspective, would they look at closed PRs ? If not, the cost is not close to zero. Meaning, some potentially useful PRs would never see the light of day. My two cents. On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote: > Part of it is how difficult it is to automate this. We can build a perfect > engine with a lot of rules that understand everything. But the more > complicated rules we need, the more unlikely for any of these to happen. So > I'd rather do this and create a nice enough message to tell contributors > sometimes mistake happen but the cost to reopen is approximately zero (i.e. > click a button on the pull request). > > > On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> bq. close the ones where they don't respond for a week >> >> Does this imply that the script understands response from human ? >> >> Meaning, would the script use some regex which signifies that the >> contributor is willing to close the PR ? >> >> If the contributor is willing to close, why wouldn't he / she do it >> him/herself ? >> >> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> Personally I'd rather err on the side of keeping PRs open, but I >>> understand wanting to keep the open PRs limited to ones which have a >>> reasonable chance of being merged. >>> >>> What about if we filtered for non-mergeable PRs or instead left a >>> comment asking the author to respond if they are still available to move >>> the PR forward - and close the ones where they don't respond for a week? >>> >>> Just a suggestion. >>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> I had one PR which got merged after 3 months. >>>> >>>> If the inactivity was due to contributor, I think it can be closed >>>> after 30 days. >>>> But if the inactivity was due to lack of review, the PR should be kept >>>> open. >>>> >>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >>>> wrote: >>>> >>>>> For what it's worth, I have definitely had PRs that sat inactive for >>>>> more than 30 days due to committers not having time to look at them, >>>>> but did eventually end up successfully being merged. >>>>> >>>>> I guess if this just ends up being a committer ping and reopening the >>>>> PR, it's fine, but I don't know if it really addresses the underlying >>>>> issue. >>>>> >>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> > We have hit a new high in open pull requests: 469 today. While we can >>>>> > certainly get more review bandwidth, many of these are old and still >>>>> open >>>>> > for other reasons. Some are stale because the original authors have >>>>> become >>>>> > busy and inactive, and some others are stale because the committers >>>>> are not >>>>> > sure whether the patch would be useful, but have not rejected the >>>>> patch >>>>> > explicitly. We can cut down the signal to noise ratio by closing pull >>>>> > requests that have been inactive for greater than 30 days, with a >>>>> nice >>>>> > message. I just checked and this would close ~ half of the pull >>>>> requests. >>>>> > >>>>> > For example: >>>>> > >>>>> > "Thank you for creating this pull request. Since this pull request >>>>> has been >>>>> > inactive for 30 days, we are automatically closing it. Closing the >>>>> pull >>>>> > request does not remove it from history and will retain all the diff >>>>> and >>>>> > review comments. If you have the bandwidth and would like to continue >>>>> > pushing this forward, please reopen it. Thanks again!" >>>>> > >>>>> > >>>>> >>>>> - >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>> >>>>> >>>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> >>> >> >
Re: auto closing pull requests that have been inactive > 30 days?
Part of it is how difficult it is to automate this. We can build a perfect engine with a lot of rules that understand everything. But the more complicated rules we need, the more unlikely for any of these to happen. So I'd rather do this and create a nice enough message to tell contributors sometimes mistake happen but the cost to reopen is approximately zero (i.e. click a button on the pull request). On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. close the ones where they don't respond for a week > > Does this imply that the script understands response from human ? > > Meaning, would the script use some regex which signifies that the > contributor is willing to close the PR ? > > If the contributor is willing to close, why wouldn't he / she do it > him/herself ? > > On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> Personally I'd rather err on the side of keeping PRs open, but I >> understand wanting to keep the open PRs limited to ones which have a >> reasonable chance of being merged. >> >> What about if we filtered for non-mergeable PRs or instead left a comment >> asking the author to respond if they are still available to move the PR >> forward - and close the ones where they don't respond for a week? >> >> Just a suggestion. >> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> I had one PR which got merged after 3 months. >>> >>> If the inactivity was due to contributor, I think it can be closed after >>> 30 days. >>> But if the inactivity was due to lack of review, the PR should be kept >>> open. >>> >>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >>> wrote: >>> >>>> For what it's worth, I have definitely had PRs that sat inactive for >>>> more than 30 days due to committers not having time to look at them, >>>> but did eventually end up successfully being merged. >>>> >>>> I guess if this just ends up being a committer ping and reopening the >>>> PR, it's fine, but I don't know if it really addresses the underlying >>>> issue. >>>> >>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> > We have hit a new high in open pull requests: 469 today. While we can >>>> > certainly get more review bandwidth, many of these are old and still >>>> open >>>> > for other reasons. Some are stale because the original authors have >>>> become >>>> > busy and inactive, and some others are stale because the committers >>>> are not >>>> > sure whether the patch would be useful, but have not rejected the >>>> patch >>>> > explicitly. We can cut down the signal to noise ratio by closing pull >>>> > requests that have been inactive for greater than 30 days, with a nice >>>> > message. I just checked and this would close ~ half of the pull >>>> requests. >>>> > >>>> > For example: >>>> > >>>> > "Thank you for creating this pull request. Since this pull request >>>> has been >>>> > inactive for 30 days, we are automatically closing it. Closing the >>>> pull >>>> > request does not remove it from history and will retain all the diff >>>> and >>>> > review comments. If you have the bandwidth and would like to continue >>>> > pushing this forward, please reopen it. Thanks again!" >>>> > >>>> > >>>> >>>> - >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >>>> >>> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> >> >
Re: auto closing pull requests that have been inactive > 30 days?
bq. close the ones where they don't respond for a week Does this imply that the script understands response from human ? Meaning, would the script use some regex which signifies that the contributor is willing to close the PR ? If the contributor is willing to close, why wouldn't he / she do it him/herself ? On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > Personally I'd rather err on the side of keeping PRs open, but I > understand wanting to keep the open PRs limited to ones which have a > reasonable chance of being merged. > > What about if we filtered for non-mergeable PRs or instead left a comment > asking the author to respond if they are still available to move the PR > forward - and close the ones where they don't respond for a week? > > Just a suggestion. > On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: > >> I had one PR which got merged after 3 months. >> >> If the inactivity was due to contributor, I think it can be closed after >> 30 days. >> But if the inactivity was due to lack of review, the PR should be kept >> open. >> >> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> For what it's worth, I have definitely had PRs that sat inactive for >>> more than 30 days due to committers not having time to look at them, >>> but did eventually end up successfully being merged. >>> >>> I guess if this just ends up being a committer ping and reopening the >>> PR, it's fine, but I don't know if it really addresses the underlying >>> issue. >>> >>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> > We have hit a new high in open pull requests: 469 today. While we can >>> > certainly get more review bandwidth, many of these are old and still >>> open >>> > for other reasons. Some are stale because the original authors have >>> become >>> > busy and inactive, and some others are stale because the committers >>> are not >>> > sure whether the patch would be useful, but have not rejected the patch >>> > explicitly. We can cut down the signal to noise ratio by closing pull >>> > requests that have been inactive for greater than 30 days, with a nice >>> > message. I just checked and this would close ~ half of the pull >>> requests. >>> > >>> > For example: >>> > >>> > "Thank you for creating this pull request. Since this pull request has >>> been >>> > inactive for 30 days, we are automatically closing it. Closing the pull >>> > request does not remove it from history and will retain all the diff >>> and >>> > review comments. If you have the bandwidth and would like to continue >>> > pushing this forward, please reopen it. Thanks again!" >>> > >>> > >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >> > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > >
Re: auto closing pull requests that have been inactive > 30 days?
+1 and at the same time maybe surface a report to this list of PRs which need committer action and have only had submitters responding to pings in the last 30 days? On Mon, Apr 18, 2016 at 3:33 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > Personally I'd rather err on the side of keeping PRs open, but I > understand wanting to keep the open PRs limited to ones which have a > reasonable chance of being merged. > > What about if we filtered for non-mergeable PRs or instead left a comment > asking the author to respond if they are still available to move the PR > forward - and close the ones where they don't respond for a week? > > Just a suggestion. > > On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: > >> I had one PR which got merged after 3 months. >> >> If the inactivity was due to contributor, I think it can be closed after >> 30 days. >> But if the inactivity was due to lack of review, the PR should be kept >> open. >> >> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> For what it's worth, I have definitely had PRs that sat inactive for >>> more than 30 days due to committers not having time to look at them, >>> but did eventually end up successfully being merged. >>> >>> I guess if this just ends up being a committer ping and reopening the >>> PR, it's fine, but I don't know if it really addresses the underlying >>> issue. >>> >>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> > We have hit a new high in open pull requests: 469 today. While we can >>> > certainly get more review bandwidth, many of these are old and still >>> open >>> > for other reasons. Some are stale because the original authors have >>> become >>> > busy and inactive, and some others are stale because the committers >>> are not >>> > sure whether the patch would be useful, but have not rejected the patch >>> > explicitly. We can cut down the signal to noise ratio by closing pull >>> > requests that have been inactive for greater than 30 days, with a nice >>> > message. I just checked and this would close ~ half of the pull >>> requests. >>> > >>> > For example: >>> > >>> > "Thank you for creating this pull request. Since this pull request has >>> been >>> > inactive for 30 days, we are automatically closing it. Closing the pull >>> > request does not remove it from history and will retain all the diff >>> and >>> > review comments. If you have the bandwidth and would like to continue >>> > pushing this forward, please reopen it. Thanks again!" >>> > >>> > >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >> > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > > -- Want to work at Handy? Check out our culture deck and open roles <http://www.handy.com/careers> Latest news <http://www.handy.com/press> at Handy Handy just raised $50m <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led by Fidelity
Re: auto closing pull requests that have been inactive > 30 days?
Personally I'd rather err on the side of keeping PRs open, but I understand wanting to keep the open PRs limited to ones which have a reasonable chance of being merged. What about if we filtered for non-mergeable PRs or instead left a comment asking the author to respond if they are still available to move the PR forward - and close the ones where they don't respond for a week? Just a suggestion. On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote: > I had one PR which got merged after 3 months. > > If the inactivity was due to contributor, I think it can be closed after > 30 days. > But if the inactivity was due to lack of review, the PR should be kept > open. > > On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org > <javascript:_e(%7B%7D,'cvml','c...@koeninger.org');>> wrote: > >> For what it's worth, I have definitely had PRs that sat inactive for >> more than 30 days due to committers not having time to look at them, >> but did eventually end up successfully being merged. >> >> I guess if this just ends up being a committer ping and reopening the >> PR, it's fine, but I don't know if it really addresses the underlying >> issue. >> >> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com >> <javascript:_e(%7B%7D,'cvml','r...@databricks.com');>> wrote: >> > We have hit a new high in open pull requests: 469 today. While we can >> > certainly get more review bandwidth, many of these are old and still >> open >> > for other reasons. Some are stale because the original authors have >> become >> > busy and inactive, and some others are stale because the committers are >> not >> > sure whether the patch would be useful, but have not rejected the patch >> > explicitly. We can cut down the signal to noise ratio by closing pull >> > requests that have been inactive for greater than 30 days, with a nice >> > message. I just checked and this would close ~ half of the pull >> requests. >> > >> > For example: >> > >> > "Thank you for creating this pull request. Since this pull request has >> been >> > inactive for 30 days, we are automatically closing it. Closing the pull >> > request does not remove it from history and will retain all the diff and >> > review comments. If you have the bandwidth and would like to continue >> > pushing this forward, please reopen it. Thanks again!" >> > >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <javascript:_e(%7B%7D,'cvml','dev-unsubscr...@spark.apache.org');> >> For additional commands, e-mail: dev-h...@spark.apache.org >> <javascript:_e(%7B%7D,'cvml','dev-h...@spark.apache.org');> >> >> > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau
Re: auto closing pull requests that have been inactive > 30 days?
Cody, Thanks for commenting. "inactive" here means no code push nor comments. So any "ping" would actually keep the pr in the open queue. Getting auto-closed also by no means indicate the pull request can't be reopened. On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> wrote: > For what it's worth, I have definitely had PRs that sat inactive for > more than 30 days due to committers not having time to look at them, > but did eventually end up successfully being merged. > > I guess if this just ends up being a committer ping and reopening the > PR, it's fine, but I don't know if it really addresses the underlying > issue. > > On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote: > > We have hit a new high in open pull requests: 469 today. While we can > > certainly get more review bandwidth, many of these are old and still open > > for other reasons. Some are stale because the original authors have > become > > busy and inactive, and some others are stale because the committers are > not > > sure whether the patch would be useful, but have not rejected the patch > > explicitly. We can cut down the signal to noise ratio by closing pull > > requests that have been inactive for greater than 30 days, with a nice > > message. I just checked and this would close ~ half of the pull requests. > > > > For example: > > > > "Thank you for creating this pull request. Since this pull request has > been > > inactive for 30 days, we are automatically closing it. Closing the pull > > request does not remove it from history and will retain all the diff and > > review comments. If you have the bandwidth and would like to continue > > pushing this forward, please reopen it. Thanks again!" > > > > >
Re: auto closing pull requests that have been inactive > 30 days?
I had one PR which got merged after 3 months. If the inactivity was due to contributor, I think it can be closed after 30 days. But if the inactivity was due to lack of review, the PR should be kept open. On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> wrote: > For what it's worth, I have definitely had PRs that sat inactive for > more than 30 days due to committers not having time to look at them, > but did eventually end up successfully being merged. > > I guess if this just ends up being a committer ping and reopening the > PR, it's fine, but I don't know if it really addresses the underlying > issue. > > On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote: > > We have hit a new high in open pull requests: 469 today. While we can > > certainly get more review bandwidth, many of these are old and still open > > for other reasons. Some are stale because the original authors have > become > > busy and inactive, and some others are stale because the committers are > not > > sure whether the patch would be useful, but have not rejected the patch > > explicitly. We can cut down the signal to noise ratio by closing pull > > requests that have been inactive for greater than 30 days, with a nice > > message. I just checked and this would close ~ half of the pull requests. > > > > For example: > > > > "Thank you for creating this pull request. Since this pull request has > been > > inactive for 30 days, we are automatically closing it. Closing the pull > > request does not remove it from history and will retain all the diff and > > review comments. If you have the bandwidth and would like to continue > > pushing this forward, please reopen it. Thanks again!" > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: auto closing pull requests that have been inactive > 30 days?
For what it's worth, I have definitely had PRs that sat inactive for more than 30 days due to committers not having time to look at them, but did eventually end up successfully being merged. I guess if this just ends up being a committer ping and reopening the PR, it's fine, but I don't know if it really addresses the underlying issue. On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote: > We have hit a new high in open pull requests: 469 today. While we can > certainly get more review bandwidth, many of these are old and still open > for other reasons. Some are stale because the original authors have become > busy and inactive, and some others are stale because the committers are not > sure whether the patch would be useful, but have not rejected the patch > explicitly. We can cut down the signal to noise ratio by closing pull > requests that have been inactive for greater than 30 days, with a nice > message. I just checked and this would close ~ half of the pull requests. > > For example: > > "Thank you for creating this pull request. Since this pull request has been > inactive for 30 days, we are automatically closing it. Closing the pull > request does not remove it from history and will retain all the diff and > review comments. If you have the bandwidth and would like to continue > pushing this forward, please reopen it. Thanks again!" > > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
auto closing pull requests that have been inactive > 30 days?
We have hit a new high in open pull requests: 469 today. While we can certainly get more review bandwidth, many of these are old and still open for other reasons. Some are stale because the original authors have become busy and inactive, and some others are stale because the committers are not sure whether the patch would be useful, but have not rejected the patch explicitly. We can cut down the signal to noise ratio by closing pull requests that have been inactive for greater than 30 days, with a nice message. I just checked and this would close ~ half of the pull requests. For example: "Thank you for creating this pull request. Since this pull request has been inactive for 30 days, we are automatically closing it. Closing the pull request does not remove it from history and will retain all the diff and review comments. If you have the bandwidth and would like to continue pushing this forward, please reopen it. Thanks again!"
[ANNOUNCE] New testing capabilities for pull requests
Hi All, For pull requests that modify the build, you can now test different build permutations as part of the pull request builder. To trigger these, you add a special phrase to the title of the pull request. Current options are: [test-maven] - run tests using maven and not sbt [test-hadoop1.0] - test using older hadoop versions (can use 1.0, 2.0, 2.2, and 2.3). The relevant source code is here: https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L193 This is useful because it allows up-front testing of build changes to avoid breaks once a patch has already been merged. I've documented this on the wiki: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Fwd: pull requests no longer closing by commit messages with closes #xxxx
FYI. -- Forwarded message -- From: John Greet (GitHub Staff) supp...@github.com Date: Mon, Jun 8, 2015 at 5:50 PM Subject: Re: pull requests no longer closing by commit messages with closes # To: Reynold Xin r...@databricks.com Hi Reynold, The problem here is that the commits closing those pull requests were fetched by our mirroring process, which doesn't have permission to close issues, instead of pushed by a user in the apache GitHub organization. Usually the repository receives regular, I assume automated, pushes to its master branch, but there was a gap in those pushes between 2pm PDT on June 5th and 1:16 PM PDT June 7th. This happened at least once before back in November. Now that those pushes have resumed pull requests are closing normally once again. Let us know if you have any other questions. Cheers, John I'm a committer on Apache Spark (the most active open source project in the data space). We use GitHub as the primary way to accept contributions. We use a custom merge script to merge pull requests rather than GitHub's merge button in order to preserve a linear commit history. Part of the merge script relies on the closes # feature to close the corresponding pull requests. I noticed recently that pull requests are no longer automatically closed, even if the commits are merged with the message closes #. Here are two recent examples: https://github.com/apache/spark/pull/6670 https://github.com/apache/spark/pull/6689 Can you take a look at what's going on? Thanks.
Re: Pull Requests on github
Cool, thanks! Let me know if there are any more core numerical libraries that you'd like to see to support Spark with optimised natives using a similar packaging model at netlib-java. I'm interested in fast random number generation next, and I keep wondering if anybody would be interested in paying for FPGA or GPU / APU backends for netlib-java. It would be a *lot* of work but I'd be very interested to talk to an organisation with such a requirement and I'd be able to do it in less time than they would internally. On 10 Feb 2015 04:12, Andrew Ash [via Apache Spark Developers List] ml-node+s1001551n10546...@n3.nabble.com wrote: Sam, I see your PR was merged -- many thanks for sending it in and getting it merged! In general for future reference, the most effective way to contribute is outlined on this wiki page: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Feb 9, 2015 at 1:04 AM, Akhil Das [hidden email] http:///user/SendEmail.jtp?type=nodenode=10546i=0 wrote: You can open a Jira issue pointing this PR to get it processed faster. :) Thanks Best Regards On Sat, Feb 7, 2015 at 7:07 AM, fommil [hidden email] http:///user/SendEmail.jtp?type=nodenode=10546i=1 wrote: Hi all, I'm the author of netlib-java and I noticed that the documentation in MLlib was out of date and misleading, so I submitted a pull request on github which will hopefully make things easier for everybody to understand the benefits of system optimised natives and how to use them :-) https://github.com/apache/spark/pull/4448 However, it looks like there are a *lot* of outstanding PRs and that this is just a mirror repository. Will somebody please look at my PR and merge into the canonical source (and let me know)? Best regards, Sam -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email] http:///user/SendEmail.jtp?type=nodenode=10546i=2 For additional commands, e-mail: [hidden email] http:///user/SendEmail.jtp?type=nodenode=10546i=3 -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10546.html To unsubscribe from Pull Requests on github, click here http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=10502code=c2FtLmhhbGxpZGF5QGdtYWlsLmNvbXwxMDUwMnwtMzI4MzQzMDI0 . NAML http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10558.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Pull Requests on github
Hi all, I'm the author of netlib-java and I noticed that the documentation in MLlib was out of date and misleading, so I submitted a pull request on github which will hopefully make things easier for everybody to understand the benefits of system optimised natives and how to use them :-) https://github.com/apache/spark/pull/4448 However, it looks like there are a *lot* of outstanding PRs and that this is just a mirror repository. Will somebody please look at my PR and merge into the canonical source (and let me know)? Best regards, Sam -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Pull Requests on github
You can open a Jira issue pointing this PR to get it processed faster. :) Thanks Best Regards On Sat, Feb 7, 2015 at 7:07 AM, fommil sam.halli...@gmail.com wrote: Hi all, I'm the author of netlib-java and I noticed that the documentation in MLlib was out of date and misleading, so I submitted a pull request on github which will hopefully make things easier for everybody to understand the benefits of system optimised natives and how to use them :-) https://github.com/apache/spark/pull/4448 However, it looks like there are a *lot* of outstanding PRs and that this is just a mirror repository. Will somebody please look at my PR and merge into the canonical source (and let me know)? Best regards, Sam -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Pull Requests
Once a PR has been tested and verified, when does it get pulled back into the trunk?
Re: Pull Requests
Can someone review patch #2309 (jira task SPARK-3178) Thanks On Mon, Oct 6, 2014 at 10:41 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Bill, Automated testing is just one small part of the process that performs basic sanity checks on code. All patches need to be championed and merged by a committer to make it into Spark. For large patches we also ask users to propose a design before sending a patch. This is discussed in our contributing page: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark If there is a patch that you are waiting for feedback on feel free to just ping this list with the patch number. Is there one you are waiting for feedback on? - Patrick On Mon, Oct 6, 2014 at 7:32 PM, Bill Bejeck bbej...@gmail.com wrote: Once a PR has been tested and verified, when does it get pulled back into the trunk?
Re: Pull requests will be automatically linked to JIRA when submitted
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but it appears to have been provided to them somehow by ASF. See this comment as an example: https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078 Might be a small win to push this work to a bot ASF manages if we can get access to it (and if we have no concerns about depending on an another external service). Nick On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for looking into this. I think little tools like this are super helpful. Would it hurt to open a request with INFRA to install/configure the JIRA-GitHub plugin while we continue to use the Python script we have? I wouldn't mind opening that JIRA issue with them. Nick On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com wrote: I spent some time on this and I'm not sure either of these is an option, unfortunately. We typically can't use custom JIRA plug-in's because this JIRA is controlled by the ASF and we don't have rights to modify most things about how it works (it's a large shared JIRA instance used by more than 50 projects). It's worth looking into whether they can do something. In general we've tended to avoid going through ASF infra them whenever possible, since they are generally overloaded and things move very slowly, even if there are outages. Here is the script we use to do the sync: https://github.com/apache/spark/blob/master/dev/github_jira_sync.py It might be possible to modify this to support post-hoc changes, but we'd need to think about how to do so while minimizing function calls to the ASF JIRA API, which I found are very slow. - Patrick On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It looks like this script doesn't catch PRs that are opened and *then* have the JIRA issue ID added to the name. Would it be easy to somehow have the script trigger on PR name changes as well as PR creates? Alternately, is there a reason we can't or don't want to use the plugin mentioned below? (I'm assuming it covers cases like this, but I'm not sure.) Nick On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: By the way, it looks like there’s a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA It does the automatic linking and shows some additional information https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: Pull requests will be automatically linked to JIRA when submitted
Hey Nicholas, That seems promising - I prefer having a proper link to having that fairly verbose comment though, because in some cases there will be dozens of comments and it could get lost. I wonder if they could do something where it posts a link instead... - Patrick On Mon, Aug 25, 2014 at 11:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: FYI: Looks like the Mesos folk also have a bot to do automatic linking, but it appears to have been provided to them somehow by ASF. See this comment as an example: https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078 Might be a small win to push this work to a bot ASF manages if we can get access to it (and if we have no concerns about depending on an another external service). Nick On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for looking into this. I think little tools like this are super helpful. Would it hurt to open a request with INFRA to install/configure the JIRA-GitHub plugin while we continue to use the Python script we have? I wouldn't mind opening that JIRA issue with them. Nick On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com wrote: I spent some time on this and I'm not sure either of these is an option, unfortunately. We typically can't use custom JIRA plug-in's because this JIRA is controlled by the ASF and we don't have rights to modify most things about how it works (it's a large shared JIRA instance used by more than 50 projects). It's worth looking into whether they can do something. In general we've tended to avoid going through ASF infra them whenever possible, since they are generally overloaded and things move very slowly, even if there are outages. Here is the script we use to do the sync: https://github.com/apache/spark/blob/master/dev/github_jira_sync.py It might be possible to modify this to support post-hoc changes, but we'd need to think about how to do so while minimizing function calls to the ASF JIRA API, which I found are very slow. - Patrick On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It looks like this script doesn't catch PRs that are opened and *then* have the JIRA issue ID added to the name. Would it be easy to somehow have the script trigger on PR name changes as well as PR creates? Alternately, is there a reason we can't or don't want to use the plugin mentioned below? (I'm assuming it covers cases like this, but I'm not sure.) Nick On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: By the way, it looks like there's a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA It does the automatic linking and shows some additional information https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: -1s on pull requests?
On Sun, Aug 3, 2014 at 4:35 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. This is now captured in this JIRA issue https://issues.apache.org/jira/browse/SPARK-2912 and completed in this PR https://github.com/apache/spark/pull/1816 which has been merged in to master. Example of old style: tests starting https://github.com/apache/spark/pull/1819#issuecomment-51416510 / tests finished https://github.com/apache/spark/pull/1819#issuecomment-51417477 (with new classes) Example of new style: tests starting https://github.com/apache/spark/pull/1816#issuecomment-51855254 / tests finished https://github.com/apache/spark/pull/1816#issuecomment-51855255 (with new classes) Nick
Re: Pull requests will be automatically linked to JIRA when submitted
I spent some time on this and I'm not sure either of these is an option, unfortunately. We typically can't use custom JIRA plug-in's because this JIRA is controlled by the ASF and we don't have rights to modify most things about how it works (it's a large shared JIRA instance used by more than 50 projects). It's worth looking into whether they can do something. In general we've tended to avoid going through ASF infra them whenever possible, since they are generally overloaded and things move very slowly, even if there are outages. Here is the script we use to do the sync: https://github.com/apache/spark/blob/master/dev/github_jira_sync.py It might be possible to modify this to support post-hoc changes, but we'd need to think about how to do so while minimizing function calls to the ASF JIRA API, which I found are very slow. - Patrick On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It looks like this script doesn't catch PRs that are opened and *then* have the JIRA issue ID added to the name. Would it be easy to somehow have the script trigger on PR name changes as well as PR creates? Alternately, is there a reason we can't or don't want to use the plugin mentioned below? (I'm assuming it covers cases like this, but I'm not sure.) Nick On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: By the way, it looks like there's a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA It does the automatic linking and shows some additional information https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: Pull requests will be automatically linked to JIRA when submitted
Thanks for looking into this. I think little tools like this are super helpful. Would it hurt to open a request with INFRA to install/configure the JIRA-GitHub plugin while we continue to use the Python script we have? I wouldn't mind opening that JIRA issue with them. Nick On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com wrote: I spent some time on this and I'm not sure either of these is an option, unfortunately. We typically can't use custom JIRA plug-in's because this JIRA is controlled by the ASF and we don't have rights to modify most things about how it works (it's a large shared JIRA instance used by more than 50 projects). It's worth looking into whether they can do something. In general we've tended to avoid going through ASF infra them whenever possible, since they are generally overloaded and things move very slowly, even if there are outages. Here is the script we use to do the sync: https://github.com/apache/spark/blob/master/dev/github_jira_sync.py It might be possible to modify this to support post-hoc changes, but we'd need to think about how to do so while minimizing function calls to the ASF JIRA API, which I found are very slow. - Patrick On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It looks like this script doesn't catch PRs that are opened and *then* have the JIRA issue ID added to the name. Would it be easy to somehow have the script trigger on PR name changes as well as PR creates? Alternately, is there a reason we can't or don't want to use the plugin mentioned below? (I'm assuming it covers cases like this, but I'm not sure.) Nick On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: By the way, it looks like there’s a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA It does the automatic linking and shows some additional information https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: -1s on pull requests?
Just came across this mail, thanks for initiating this discussion Kay. To add; another issue which recurs is very rapid commit's: before most contributors have had a chance to even look at the changes proposed. There is not much prior discussion on the jira or pr, and the time between submitting the PR and committing it is 12 hours. Particularly relevant when contributors are not on US timezones and/or colocated; I have raised this a few times before when the commit had other side effects not considered. On flip side we have PR's which have been languishing for weeks with little or no activity from committers side - making the contribution stale; so too long a delay is also definitely not the direction to take either ! Regards, Mridul On Tue, Jul 22, 2014 at 2:14 AM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, As the number of committers / contributors on Spark has increased, there are cases where pull requests get merged before all the review comments have been addressed. This happens say when one committer points out a problem with the pull request, and another committer doesn't see the earlier comment and merges the PR before the comment has been addressed. This is especially tricky for pull requests with a large number of comments, because it can be difficult to notice early comments describing blocking issues. This also happens when something accidentally gets merged after the tests have started but before tests have passed. Do folks have ideas on how we can handle this issue? Are there other projects that have good ways of handling this? It looks like for Hadoop, people can -1 / +1 on the JIRA. -Kay - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: -1s on pull requests?
1. Include the commit hash in the tests have started/completed FYI: Looks like Xiangrui's already got a JIRA issue for this. SPARK-2622: Add Jenkins build numbers to SparkQA messages https://issues.apache.org/jira/browse/SPARK-2622 2. Pin a message to the start or end of the PR Should new JIRA issues for this item fall under the following umbrella issue? SPARK-2230: Improvements to Jenkins QA Harness https://issues.apache.org/jira/browse/SPARK-2230 Nick
Re: -1s on pull requests?
I think the build number is included in the SparkQA message, for example: https://github.com/apache/spark/pull/1788 The build number 17941 is in the URL https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17941/consoleFull;. Just need to be careful to match the number. Another solution is to kill running Jenkins jobs if there is a code change. On Tue, Aug 5, 2014 at 8:48 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: 1. Include the commit hash in the tests have started/completed FYI: Looks like Xiangrui's already got a JIRA issue for this. SPARK-2622: Add Jenkins build numbers to SparkQA messages https://issues.apache.org/jira/browse/SPARK-2622 2. Pin a message to the start or end of the PR Should new JIRA issues for this item fall under the following umbrella issue? SPARK-2230: Improvements to Jenkins QA Harness https://issues.apache.org/jira/browse/SPARK-2230 Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: -1s on pull requests?
On Mon, Jul 21, 2014 at 4:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: This also happens when something accidentally gets merged after the tests have started but before tests have passed. Some improvements to SparkQA https://github.com/SparkQA could help with this. May I suggest: 1. Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. 2. Pin a message to the start or end of the PR that is updated with the status of the PR. Testing not complete; New commits since last test; Tests failed; etc. It should be easy for committers to get the status of the PR at a glance, without scrolling through the comment history. Nick
Re: -1s on pull requests?
1. Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. Great idea - I think this is easy to do given the current architecture. We already have access to the commit ID in the same script that posts the comments. 2. Pin a message to the start or end of the PR that is updated with the status of the PR. Testing not complete; New commits since last test; Tests failed; etc. It should be easy for committers to get the status of the PR at a glance, without scrolling through the comment history. This also is a good idea - I think this would be doable since the github API allows us to edit comments, but it's a bit tricker. I think it would require first making an API call to get the status comment ID and then updating it. Nick Nick - Any interest in doing these? this is all doable from within the spark repo itself because our QA harness scripts are in there: https://github.com/apache/spark/blob/master/dev/run-tests-jenkins If not, could you make a JIRA for them and put it under Project Infra. - Patrick
Re: -1s on pull requests?
On Sun, Aug 3, 2014 at 11:29 PM, Patrick Wendell pwend...@gmail.com wrote: Nick - Any interest in doing these? this is all doable from within the spark repo itself because our QA harness scripts are in there: https://github.com/apache/spark/blob/master/dev/run-tests-jenkins If not, could you make a JIRA for them and put it under Project Infra. I’ll make the JIRA and think about how to do this stuff. I’ll have to understand what that run-tests-jenkins script does and see how easy it is to extend. Nick
Re: Pull requests will be automatically linked to JIRA when submitted
By the way, it looks like there’s a JIRA plugin that integrates it with GitHub: - https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin - https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA It does the automatic linking and shows some additional information https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: -1s on pull requests?
One way to do this would be to have a Github hook that parses -1s or +1s and posts a commit status [1] (like say Travis [2]) right next to the PR. Does anybody know of an existing tool that does this ? Shivaram [1] https://github.com/blog/1227-commit-status-api [2] http://blog.travis-ci.com/2012-09-04-pull-requests-just-got-even-more-awesome/ On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, As the number of committers / contributors on Spark has increased, there are cases where pull requests get merged before all the review comments have been addressed. This happens say when one committer points out a problem with the pull request, and another committer doesn't see the earlier comment and merges the PR before the comment has been addressed. This is especially tricky for pull requests with a large number of comments, because it can be difficult to notice early comments describing blocking issues. This also happens when something accidentally gets merged after the tests have started but before tests have passed. Do folks have ideas on how we can handle this issue? Are there other projects that have good ways of handling this? It looks like for Hadoop, people can -1 / +1 on the JIRA. -Kay
Re: -1s on pull requests?
I've always operated under the assumption that if a commiter makes a comment on a PR, and that's not addressed, that should block the PR from being merged (even without a specific -1). I don't know of any cases where this has intentionally been violated, but I do think this happens accidentally some times. Unfortunately, we are not allowed to use those github hooks because of the way the ASF github integration works. I've lately been using a custom-made tool to help review pull requests. One thing I could do is add a feature here saying which committers have said LGTM on a PR (vs the ones that have commented). We could also indicate the latest test status as Green/Yellow/Red based on the Jenkins comments: http://pwendell.github.io/spark-github-shim/ As a warning to potential users, my tool might crash your browser. - Patrick On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, As the number of committers / contributors on Spark has increased, there are cases where pull requests get merged before all the review comments have been addressed. This happens say when one committer points out a problem with the pull request, and another committer doesn't see the earlier comment and merges the PR before the comment has been addressed. This is especially tricky for pull requests with a large number of comments, because it can be difficult to notice early comments describing blocking issues. This also happens when something accidentally gets merged after the tests have started but before tests have passed. Do folks have ideas on how we can handle this issue? Are there other projects that have good ways of handling this? It looks like for Hadoop, people can -1 / +1 on the JIRA. -Kay
Re: -1s on pull requests?
There is ASF guidelines about Voting, including code review for patches: http://www.apache.org/foundation/voting.html Some ASF project do three +1 votes are required (to the issues like JIRA or Github PR in this case) for a patch unless it is tagged with lazy consensus [1] of like 48 hours. For patches that are not critical, waiting for a while to let some time for additional committers to review would be the best way to go. Another thing is that all contributors need to be patience once their patches have been submitted and pending reviewed. This is part of being in open community. Hope this helps. - Henry [1] http://www.apache.org/foundation/glossary.html#LazyConsensus On Mon, Jul 21, 2014 at 1:59 PM, Patrick Wendell pwend...@gmail.com wrote: I've always operated under the assumption that if a commiter makes a comment on a PR, and that's not addressed, that should block the PR from being merged (even without a specific -1). I don't know of any cases where this has intentionally been violated, but I do think this happens accidentally some times. Unfortunately, we are not allowed to use those github hooks because of the way the ASF github integration works. I've lately been using a custom-made tool to help review pull requests. One thing I could do is add a feature here saying which committers have said LGTM on a PR (vs the ones that have commented). We could also indicate the latest test status as Green/Yellow/Red based on the Jenkins comments: http://pwendell.github.io/spark-github-shim/ As a warning to potential users, my tool might crash your browser. - Patrick On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, As the number of committers / contributors on Spark has increased, there are cases where pull requests get merged before all the review comments have been addressed. This happens say when one committer points out a problem with the pull request, and another committer doesn't see the earlier comment and merges the PR before the comment has been addressed. This is especially tricky for pull requests with a large number of comments, because it can be difficult to notice early comments describing blocking issues. This also happens when something accidentally gets merged after the tests have started but before tests have passed. Do folks have ideas on how we can handle this issue? Are there other projects that have good ways of handling this? It looks like for Hadoop, people can -1 / +1 on the JIRA. -Kay
Re: Pull requests will be automatically linked to JIRA when submitted
Awesome! On Saturday, July 19, 2014, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: Pull requests will be automatically linked to JIRA when submitted
That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Re: Pull requests will be automatically linked to JIRA when submitted
Yeah it needs to have SPARK-XXX in the title (this is the format we request already). It just works with small synchronization script I wrote that we run every five minutes on Jeknins that uses the Github and Jenkins API: https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com wrote: Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick
Pull requests will be automatically linked to JIRA when submitted
Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people watching. This should make some things easier, such as avoiding accidental duplicate effort on the same JIRA. - Patrick