Any way for users to help "stuck" JIRAs with pull requests for Spark 2.3 / future releases?

2017-12-21 Thread Ewan Leith
Hi all,

I was wondering with the approach of Spark 2.3 if there's any way us "regular" 
users can help advance any of JIRAs that could have made it into Spark 2.3 but 
are likely to miss now as the pull requests are awaiting detailed review.

For example:

https://issues.apache.org/jira/browse/SPARK-4502 - Spark SQL reads unneccesary 
nested fields from Parquet

Has a pull request from January 2017 with significant performance benefits for 
parquet reads.

https://issues.apache.org/jira/browse/SPARK-21657 - Spark has exponential time 
complexity to explode(array of structs)

Probably affects fewer users, but will be a real help for those users.

Both of these example tickets probably need more testing, but without them 
getting merged into the master branch and included in a release with a default 
config setting disabling them, the testing will be pretty limited.

Is there anything us users can do to help out with these kind of tickets, or do 
they need to wait for some additional core developer time to free up (I know 
that's in huge demand everywhere in the project!).

Thanks,
Ewan






This email and any attachments to it may contain confidential information and 
are intended solely for the addressee.



If you are not the intended recipient of this email or if you believe you have 
received this email in error, please contact the sender and remove it from your 
system.Do not use, copy or disclose the information contained in this email or 
in any attachment.

RealityMine Limited may monitor email traffic data including the content of 
email for the purposes of security.

RealityMine Limited is a company registered in England and Wales. Registered 
number: 07920936 Registered office: Warren Bruce Court, Warren Bruce Road, 
Trafford Park, Manchester M17 1LB


Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Reynold Xin
Github already links to CONTRIBUTING.md. -- of course, a lot of people
ignore that. One thing we can do is to add an explicit link to the wiki
contributing page in the template (but note that even that introduces some
overhead for every pull request).

Aside from that, I am not sure if the other suggestions in the JIRA ticket
are necessary. For example, the issue with creating a pull request from one
branch to another is a problem, but it happens perhaps less than once a
week and is trivially closeable. Adding an explicit warning there will fix
some cases, but won't entirely eliminate the problem (because I'm sure a
lot of people still don't read the template), and will introduce another
overhead for everybody who submits the proper way.


On Sun, Oct 9, 2016 at 10:14 AM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Should we just link to
>
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>
>
>
>
> On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" <gurwls...@gmail.com
> > wrote:
>
> Thanks for confirming this, Sean. I filed this in
> https://issues.apache.org/jira/browse/SPARK-17840
>
> I would appreciate if anyone who has a better writing skills better than
> me tries to fix this.
>
> I don't want to let reviewers make an effort to correct the grammar.
>
>
> On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote:
>
>> Yes, it's really CONTRIBUTING.md that's more relevant, because github
>> displays a link to it when opening pull requests. https://github.com/a
>> pache/spark/blob/master/CONTRIBUTING.md  There is also the pull request
>> template: https://github.com/apache/spark/blob/master/.githu
>> b/PULL_REQUEST_TEMPLATE
>>
>> I wouldn't want to duplicate info too much, but more pointers to a single
>> source of information seems OK. Although I don't know if it will help much,
>> sure, pointers from README.md are OK.
>>
>> On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>>
>>> I just noticed the README.md (https://github.com/apache/spark) does not
>>> describe the steps or links to follow for creating a PR or JIRA directly. I
>>> know probably it is sensible to search google about the contribution guides
>>> first before trying to make a PR/JIRA but I think it seems not enough when
>>> I see some inappropriate PRs/JIRAs time to time.
>>>
>>> I guess flooding JIRAs and PRs is problematic (assuming from the emails
>>> in dev mailing list) and I think we should explicitly mention and describe
>>> this in the README.md and pull request template[1].
>>>
>>> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true
>>> that we still have some PRs or JIRAs not following the documentation.)
>>>
>>> So, my suggestions are as below:
>>>
>>> - Create a section maybe "Contributing To Apache Spark" describing the
>>> Wiki and CONTRIBUTING.md[2] in the README.md.
>>>
>>> - Describe an explicit warning in pull request template[1], for
>>> example, "Please double check if your pull request is from a branch to a
>>> branch. In most cases, this change is not appropriate. Please ask to
>>> mailing list (http://spark.apache.org/community.html) if you are not
>>> sure."
>>>
>>> [1]https://github.com/apache/spark/blob/master/.github/PULL_
>>> REQUEST_TEMPLATE
>>> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md
>>> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>>>
>>>
>>> Thank you all.
>>>
>>


Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Reynold Xin
Actually let's move the discussion to the JIRA ticket, given there is a
ticket.


On Sun, Oct 9, 2016 at 5:36 PM, Reynold Xin <r...@databricks.com> wrote:

> Github already links to CONTRIBUTING.md. -- of course, a lot of people
> ignore that. One thing we can do is to add an explicit link to the wiki
> contributing page in the template (but note that even that introduces some
> overhead for every pull request).
>
> Aside from that, I am not sure if the other suggestions in the JIRA ticket
> are necessary. For example, the issue with creating a pull request from one
> branch to another is a problem, but it happens perhaps less than once a
> week and is trivially closeable. Adding an explicit warning there will fix
> some cases, but won't entirely eliminate the problem (because I'm sure a
> lot of people still don't read the template), and will introduce another
> overhead for everybody who submits the proper way.
>
>
> On Sun, Oct 9, 2016 at 10:14 AM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>> Should we just link to
>>
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>>
>>
>>
>>
>> On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" <
>> gurwls...@gmail.com> wrote:
>>
>> Thanks for confirming this, Sean. I filed this in
>> https://issues.apache.org/jira/browse/SPARK-17840
>>
>> I would appreciate if anyone who has a better writing skills better than
>> me tries to fix this.
>>
>> I don't want to let reviewers make an effort to correct the grammar.
>>
>>
>> On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote:
>>
>>> Yes, it's really CONTRIBUTING.md that's more relevant, because github
>>> displays a link to it when opening pull requests. https://github.com/a
>>> pache/spark/blob/master/CONTRIBUTING.md  There is also the pull request
>>> template: https://github.com/apache/spark/blob/master/.githu
>>> b/PULL_REQUEST_TEMPLATE
>>>
>>> I wouldn't want to duplicate info too much, but more pointers to a
>>> single source of information seems OK. Although I don't know if it will
>>> help much, sure, pointers from README.md are OK.
>>>
>>> On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>> I just noticed the README.md (https://github.com/apache/spark) does
>>>> not describe the steps or links to follow for creating a PR or JIRA
>>>> directly. I know probably it is sensible to search google about the
>>>> contribution guides first before trying to make a PR/JIRA but I think it
>>>> seems not enough when I see some inappropriate PRs/JIRAs time to time.
>>>>
>>>> I guess flooding JIRAs and PRs is problematic (assuming from the
>>>> emails in dev mailing list) and I think we should explicitly mention and
>>>> describe this in the README.md and pull request template[1].
>>>>
>>>> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true
>>>> that we still have some PRs or JIRAs not following the documentation.)
>>>>
>>>> So, my suggestions are as below:
>>>>
>>>> - Create a section maybe "Contributing To Apache Spark" describing the
>>>> Wiki and CONTRIBUTING.md[2] in the README.md.
>>>>
>>>> - Describe an explicit warning in pull request template[1], for
>>>> example, "Please double check if your pull request is from a branch to a
>>>> branch. In most cases, this change is not appropriate. Please ask to
>>>> mailing list (http://spark.apache.org/community.html) if you are not
>>>> sure."
>>>>
>>>> [1]https://github.com/apache/spark/blob/master/.github/PULL_
>>>> REQUEST_TEMPLATE
>>>> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md
>>>> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>>>>
>>>>
>>>> Thank you all.
>>>>
>>>
>


Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Felix Cheung
Should we just link to

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark




On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote:

Thanks for confirming this, Sean. I filed this in 
https://issues.apache.org/jira/browse/SPARK-17840

I would appreciate if anyone who has a better writing skills better than me 
tries to fix this.

I don't want to let reviewers make an effort to correct the grammar.


On 10 Oct 2016 1:34 a.m., "Sean Owen" 
<so...@cloudera.com<mailto:so...@cloudera.com>> wrote:
Yes, it's really CONTRIBUTING.md that's more relevant, because github displays 
a link to it when opening pull requests. 
https://github.com/apache/spark/blob/master/CONTRIBUTING.md  There is also the 
pull request template: 
https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE

I wouldn't want to duplicate info too much, but more pointers to a single 
source of information seems OK. Although I don't know if it will help much, 
sure, pointers from README.md are OK.

On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon 
<gurwls...@gmail.com<mailto:gurwls...@gmail.com>> wrote:
Hi all,


I just noticed the README.md (https://github.com/apache/spark) does not 
describe the steps or links to follow for creating a PR or JIRA directly. I 
know probably it is sensible to search google about the contribution guides 
first before trying to make a PR/JIRA but I think it seems not enough when I 
see some inappropriate PRs/JIRAs time to time.

I guess flooding JIRAs and PRs is problematic (assuming from the emails in dev 
mailing list) and I think we should explicitly mention and describe this in the 
README.md and pull request template[1].

(I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true that we 
still have some PRs or JIRAs not following the documentation.)

So, my suggestions are as below:

- Create a section maybe "Contributing To Apache Spark" describing the Wiki and 
CONTRIBUTING.md[2] in the README.md.

- Describe an explicit warning in pull request template[1], for example, 
"Please double check if your pull request is from a branch to a branch. In most 
cases, this change is not appropriate. Please ask to mailing list 
(http://spark.apache.org/community.html) if you are not sure."

[1]https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
[2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md
[3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage


Thank you all.


Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
Thanks for confirming this, Sean. I filed this in
https://issues.apache.org/jira/browse/SPARK-17840

I would appreciate if anyone who has a better writing skills better than me
tries to fix this.

I don't want to let reviewers make an effort to correct the grammar.


On 10 Oct 2016 1:34 a.m., "Sean Owen" <so...@cloudera.com> wrote:

> Yes, it's really CONTRIBUTING.md that's more relevant, because github
> displays a link to it when opening pull requests. https://github.com/a
> pache/spark/blob/master/CONTRIBUTING.md  There is also the pull request
> template: https://github.com/apache/spark/blob/master/.githu
> b/PULL_REQUEST_TEMPLATE
>
> I wouldn't want to duplicate info too much, but more pointers to a single
> source of information seems OK. Although I don't know if it will help much,
> sure, pointers from README.md are OK.
>
> On Sun, Oct 9, 2016 at 3:47 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> Hi all,
>>
>>
>> I just noticed the README.md (https://github.com/apache/spark) does not
>> describe the steps or links to follow for creating a PR or JIRA directly. I
>> know probably it is sensible to search google about the contribution guides
>> first before trying to make a PR/JIRA but I think it seems not enough when
>> I see some inappropriate PRs/JIRAs time to time.
>>
>> I guess flooding JIRAs and PRs is problematic (assuming from the emails
>> in dev mailing list) and I think we should explicitly mention and describe
>> this in the README.md and pull request template[1].
>>
>> (I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true
>> that we still have some PRs or JIRAs not following the documentation.)
>>
>> So, my suggestions are as below:
>>
>> - Create a section maybe "Contributing To Apache Spark" describing the
>> Wiki and CONTRIBUTING.md[2] in the README.md.
>>
>> - Describe an explicit warning in pull request template[1], for example,
>> "Please double check if your pull request is from a branch to a branch. In
>> most cases, this change is not appropriate. Please ask to mailing list (
>> http://spark.apache.org/community.html) if you are not sure."
>>
>> [1]https://github.com/apache/spark/blob/master/.github/PULL_
>> REQUEST_TEMPLATE
>> [2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md
>> [3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>>
>>
>> Thank you all.
>>
>


Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Hyukjin Kwon
Hi all,


I just noticed the README.md (https://github.com/apache/spark) does not
describe the steps or links to follow for creating a PR or JIRA directly. I
know probably it is sensible to search google about the contribution guides
first before trying to make a PR/JIRA but I think it seems not enough when
I see some inappropriate PRs/JIRAs time to time.

I guess flooding JIRAs and PRs is problematic (assuming from the emails in
dev mailing list) and I think we should explicitly mention and describe
this in the README.md and pull request template[1].

(I know we have CONTBITUTING.md[2] and wiki[3] but it seems pretty true
that we still have some PRs or JIRAs not following the documentation.)

So, my suggestions are as below:

- Create a section maybe "Contributing To Apache Spark" describing the Wiki
and CONTRIBUTING.md[2] in the README.md.

- Describe an explicit warning in pull request template[1], for example,
"Please double check if your pull request is from a branch to a branch. In
most cases, this change is not appropriate. Please ask to mailing list (
http://spark.apache.org/community.html) if you are not sure."

[1]https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE
[2]https://github.com/apache/spark/blob/master/CONTRIBUTING.md
[3]https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage


Thank you all.


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Reynold Xin
Thanks a lot for commenting. We are getting great feedback on this thread.
The take-aways are:

1. In general people prefer having explicit reasons why pull requests
should be closed. We should push committers to leave messages that are more
explicit about why certain PR should be closed or not. I can't agree more.
But this is not mutually exclusive.


2.  It is difficult to deal with the scale we are talking about. There is
not a single measure that could "fix" everything.

Spark is as far as I know one of the most active open source projects in
terms of contributions, in part because we have made it very easy to accept
contributions. There have been very few open source projects that needed to
deal with this scale. Actually if you look at all the historic PRs, we
closed 12k and have ~450 open. That's less than 4% of the prs outstanding
-- not a bad number. The actual ratio is likely even lower because many of
the 450 open will be merged in the future.

I also took a look at some of the most popular projects on github (e.g.
jquery, angular, react) -- they either have far fewer merged pull requests
or a higher ratio of open-to-close. So we are actually doing pretty well.
But of course there is always room for improvement.




On Mon, Apr 18, 2016 at 8:46 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Relevant: https://github.com/databricks/spark-pr-dashboard/issues/1
>
> A lot of this was discussed a while back when the PR Dashboard was first
> introduced, and several times before and after that as well. (e.g. August
> 2014
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-td8015.html>
> )
>
> If there is not enough momentum to build the tooling that people are
> discussing here, then perhaps Reynold's suggestion is the most practical
> one that is likely to see the light of day.
>
> I think asking committers to be more active in commenting on PRs is
> theoretically the correct thing to do, but impractical. I'm not a
> committer, but I would guess that most of them are already way
> overcommitted (ha!) and asking them to do more just won't yield results.
>
> We've had several instances in the past where we all tried to rally
> <https://mail-archives.apache.org/mod_mbox/spark-dev/201412.mbox/%3ccaohmdzer4cg_wxgktoxsg8s34krqezygjfzdoymgu9vhyjb...@mail.gmail.com%3E>
> and be more proactive about giving feedback, closing PRs, and nudging
> contributors who have gone silent. My observation is that the level of
> energy required to "properly" curate PR activity in that way is simply not
> sustainable. People can do it for a few weeks and then things revert to the
> way they are now.
>
> Perhaps the missing link that would make this sustainable is better
> tooling. If you think so and can sling some Javascript, you might want to
> contribute to the PR Dashboard <https://spark-prs.appspot.com/>.
>
> Perhaps the missing link is something else: A different PR review process;
> more committers; a higher barrier to contributing; a combination thereof;
> etc...
>
> Also relevant: http://danluu.com/discourage-oss/
>
> By the way, some people noted that closing PRs may discourage
> contributors. I think our open PR count alone is very discouraging. Under
> what circumstances would you feel encouraged to open a PR against a project
> that has hundreds of open PRs, some from many, many months ago
> <https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-asc>
> ?
>
> Nick
>
>
> 2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성:
>
>> During the months of November / December, the 30 day period should be
>> relaxed.
>>
>> Some people(at least in US) may take extended vacation during that time.
>>
>> For Chinese developers, Spring Festival would bear similar circumstance.
>>
>> On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com>
>> wrote:
>>
>>> I also think this might not have to be closed only because it is
>>> inactive.
>>>
>>>
>>> How about closing issues after 30 days when a committer's comment is
>>> added at the last without responses from the author?
>>>
>>>
>>> IMHO, If the committers are not sure whether the patch would be useful,
>>> then I think they should leave some comments why they are not sure, not
>>> just ignoring.
>>>
>>> Or, simply they could ask the author to prove that the patch is useful
>>> or safe with some references and tests.
>>>
>>>
>>> I think it might be nicer than that users are supposed to keep pinging.
>>> **Personally**, apparently, I am sometimes a bit worried if pinging
>&g

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
t the last without responses from the author?
>>>>
>>>>
>>>> IMHO, If the committers are not sure whether the patch would be
>>>> useful, then I think they should leave some comments why they are not sure,
>>>> not just ignoring.
>>>>
>>>> Or, simply they could ask the author to prove that the patch is useful
>>>> or safe with some references and tests.
>>>>
>>>>
>>>> I think it might be nicer than that users are supposed to keep pinging.
>>>> **Personally**, apparently, I am sometimes a bit worried if pinging
>>>> multiple times can be a bit annoying.
>>>>
>>>>
>>>>
>>>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>
>>>>> It would be better to have a specific technical reason why this PR
>>>>> should be closed, either the implementation is not good or the problem is
>>>>> not valid, or something else. That will actually help the contributor to
>>>>> shape their codes and reopen the PR again. Otherwise reasons like "feel
>>>>> free to reopen for so-and-so reason" is actually discouraging and no
>>>>> difference than directly close the PR.
>>>>>
>>>>> Just my two cents.
>>>>>
>>>>> Thanks
>>>>> Jerry
>>>>>
>>>>>
>>>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Having a PR closed, especially if due to committers not having hte
>>>>>> bandwidth to check on things, will be very discouraging to new folks.
>>>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>>>> confidence are going to see reopening as "pestering" and busy folks
>>>>>> are going to see it as a clear indication that their work is not even
>>>>>> valuable enough for a human to give a reason for closing. In either
>>>>>> case, the cost of reopening is substantially higher than that button
>>>>>> press.
>>>>>>
>>>>>> How about we start by keeping a report of "at-risk" PRs that have been
>>>>>> stale for 30 days to make it easier for committers to look at the prs
>>>>>> that have been long inactive?
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com>
>>>>>> wrote:
>>>>>> > The cost of "reopen" is close to zero, because it is just clicking
>>>>>> a button.
>>>>>> > I think you were referring to the cost of closing the pull request,
>>>>>> and you
>>>>>> > are assuming people look at the pull requests that have been
>>>>>> inactive for a
>>>>>> > long time. That seems equally likely (or unlikely) as committers
>>>>>> looking at
>>>>>> > the recently closed pull requests.
>>>>>> >
>>>>>> > In either case, most pull requests are scanned through by us when
>>>>>> they are
>>>>>> > first open, and if they are important enough, usually they get
>>>>>> merged
>>>>>> > quickly or a target version is set in JIRA. We can definitely
>>>>>> improve that
>>>>>> > by making it more explicit.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> From committers' perspective, would they look at closed PRs ?
>>>>>> >>
>>>>>> >> If not, the cost is not close to zero.
>>>>>> >> Meaning, some potentially useful PRs would never see the light of
>>>>>> day.
>>>>>> >>
>>>>>> >> My two cents.
>>>>>> >>
>>>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Part of it is how d

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Saisai Shao
;>>> shape their codes and reopen the PR again. Otherwise reasons like "feel
>>>> free to reopen for so-and-so reason" is actually discouraging and no
>>>> difference than directly close the PR.
>>>>
>>>> Just my two cents.
>>>>
>>>> Thanks
>>>> Jerry
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Having a PR closed, especially if due to committers not having hte
>>>>> bandwidth to check on things, will be very discouraging to new folks.
>>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>>> confidence are going to see reopening as "pestering" and busy folks
>>>>> are going to see it as a clear indication that their work is not even
>>>>> valuable enough for a human to give a reason for closing. In either
>>>>> case, the cost of reopening is substantially higher than that button
>>>>> press.
>>>>>
>>>>> How about we start by keeping a report of "at-risk" PRs that have been
>>>>> stale for 30 days to make it easier for committers to look at the prs
>>>>> that have been long inactive?
>>>>>
>>>>>
>>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> > The cost of "reopen" is close to zero, because it is just clicking a
>>>>> button.
>>>>> > I think you were referring to the cost of closing the pull request,
>>>>> and you
>>>>> > are assuming people look at the pull requests that have been
>>>>> inactive for a
>>>>> > long time. That seems equally likely (or unlikely) as committers
>>>>> looking at
>>>>> > the recently closed pull requests.
>>>>> >
>>>>> > In either case, most pull requests are scanned through by us when
>>>>> they are
>>>>> > first open, and if they are important enough, usually they get merged
>>>>> > quickly or a target version is set in JIRA. We can definitely
>>>>> improve that
>>>>> > by making it more explicit.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> From committers' perspective, would they look at closed PRs ?
>>>>> >>
>>>>> >> If not, the cost is not close to zero.
>>>>> >> Meaning, some potentially useful PRs would never see the light of
>>>>> day.
>>>>> >>
>>>>> >> My two cents.
>>>>> >>
>>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Part of it is how difficult it is to automate this. We can build a
>>>>> >>> perfect engine with a lot of rules that understand everything. But
>>>>> the more
>>>>> >>> complicated rules we need, the more unlikely for any of these to
>>>>> happen. So
>>>>> >>> I'd rather do this and create a nice enough message to tell
>>>>> contributors
>>>>> >>> sometimes mistake happen but the cost to reopen is approximately
>>>>> zero (i.e.
>>>>> >>> click a button on the pull request).
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> bq. close the ones where they don't respond for a week
>>>>> >>>>
>>>>> >>>> Does this imply that the script understands response from human ?
>>>>> >>>>
>>>>> >>>> Meaning, would the script use some regex which signifies that the
>>>>> >>>> contributor is willing to close the PR ?
>>>>> >>>>
>>>>> >>>> If the contributor is willin

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Hyukjin Kwon
gt;
>>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>
>>>> It would be better to have a specific technical reason why this PR
>>>> should be closed, either the implementation is not good or the problem is
>>>> not valid, or something else. That will actually help the contributor to
>>>> shape their codes and reopen the PR again. Otherwise reasons like "feel
>>>> free to reopen for so-and-so reason" is actually discouraging and no
>>>> difference than directly close the PR.
>>>>
>>>> Just my two cents.
>>>>
>>>> Thanks
>>>> Jerry
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Having a PR closed, especially if due to committers not having hte
>>>>> bandwidth to check on things, will be very discouraging to new folks.
>>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>>> confidence are going to see reopening as "pestering" and busy folks
>>>>> are going to see it as a clear indication that their work is not even
>>>>> valuable enough for a human to give a reason for closing. In either
>>>>> case, the cost of reopening is substantially higher than that button
>>>>> press.
>>>>>
>>>>> How about we start by keeping a report of "at-risk" PRs that have been
>>>>> stale for 30 days to make it easier for committers to look at the prs
>>>>> that have been long inactive?
>>>>>
>>>>>
>>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> > The cost of "reopen" is close to zero, because it is just clicking a
>>>>> button.
>>>>> > I think you were referring to the cost of closing the pull request,
>>>>> and you
>>>>> > are assuming people look at the pull requests that have been
>>>>> inactive for a
>>>>> > long time. That seems equally likely (or unlikely) as committers
>>>>> looking at
>>>>> > the recently closed pull requests.
>>>>> >
>>>>> > In either case, most pull requests are scanned through by us when
>>>>> they are
>>>>> > first open, and if they are important enough, usually they get merged
>>>>> > quickly or a target version is set in JIRA. We can definitely
>>>>> improve that
>>>>> > by making it more explicit.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> From committers' perspective, would they look at closed PRs ?
>>>>> >>
>>>>> >> If not, the cost is not close to zero.
>>>>> >> Meaning, some potentially useful PRs would never see the light of
>>>>> day.
>>>>> >>
>>>>> >> My two cents.
>>>>> >>
>>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Part of it is how difficult it is to automate this. We can build a
>>>>> >>> perfect engine with a lot of rules that understand everything. But
>>>>> the more
>>>>> >>> complicated rules we need, the more unlikely for any of these to
>>>>> happen. So
>>>>> >>> I'd rather do this and create a nice enough message to tell
>>>>> contributors
>>>>> >>> sometimes mistake happen but the cost to reopen is approximately
>>>>> zero (i.e.
>>>>> >>> click a button on the pull request).
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> bq. close the ones where they don't respond for a week
>>>>> >>>>
>>>>> >>>> Doe

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Nicholas Chammas
Relevant: https://github.com/databricks/spark-pr-dashboard/issues/1

A lot of this was discussed a while back when the PR Dashboard was first
introduced, and several times before and after that as well. (e.g. August
2014
<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-td8015.html>
)

If there is not enough momentum to build the tooling that people are
discussing here, then perhaps Reynold's suggestion is the most practical
one that is likely to see the light of day.

I think asking committers to be more active in commenting on PRs is
theoretically the correct thing to do, but impractical. I'm not a
committer, but I would guess that most of them are already way
overcommitted (ha!) and asking them to do more just won't yield results.

We've had several instances in the past where we all tried to rally
<https://mail-archives.apache.org/mod_mbox/spark-dev/201412.mbox/%3ccaohmdzer4cg_wxgktoxsg8s34krqezygjfzdoymgu9vhyjb...@mail.gmail.com%3E>
and be more proactive about giving feedback, closing PRs, and nudging
contributors who have gone silent. My observation is that the level of
energy required to "properly" curate PR activity in that way is simply not
sustainable. People can do it for a few weeks and then things revert to the
way they are now.

Perhaps the missing link that would make this sustainable is better
tooling. If you think so and can sling some Javascript, you might want to
contribute to the PR Dashboard <https://spark-prs.appspot.com/>.

Perhaps the missing link is something else: A different PR review process;
more committers; a higher barrier to contributing; a combination thereof;
etc...

Also relevant: http://danluu.com/discourage-oss/

By the way, some people noted that closing PRs may discourage contributors.
I think our open PR count alone is very discouraging. Under what
circumstances would you feel encouraged to open a PR against a project that
has hundreds of open PRs, some from many, many months ago
<https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-asc>
?

Nick


2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성:

> During the months of November / December, the 30 day period should be
> relaxed.
>
> Some people(at least in US) may take extended vacation during that time.
>
> For Chinese developers, Spring Festival would bear similar circumstance.
>
> On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> I also think this might not have to be closed only because it is
>> inactive.
>>
>>
>> How about closing issues after 30 days when a committer's comment is
>> added at the last without responses from the author?
>>
>>
>> IMHO, If the committers are not sure whether the patch would be useful,
>> then I think they should leave some comments why they are not sure, not
>> just ignoring.
>>
>> Or, simply they could ask the author to prove that the patch is useful
>> or safe with some references and tests.
>>
>>
>> I think it might be nicer than that users are supposed to keep pinging.
>> **Personally**, apparently, I am sometimes a bit worried if pinging
>> multiple times can be a bit annoying.
>>
>>
>>
>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>
>>> It would be better to have a specific technical reason why this PR
>>> should be closed, either the implementation is not good or the problem is
>>> not valid, or something else. That will actually help the contributor to
>>> shape their codes and reopen the PR again. Otherwise reasons like "feel
>>> free to reopen for so-and-so reason" is actually discouraging and no
>>> difference than directly close the PR.
>>>
>>> Just my two cents.
>>>
>>> Thanks
>>> Jerry
>>>
>>>
>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com>
>>> wrote:
>>>
>>>> Having a PR closed, especially if due to committers not having hte
>>>> bandwidth to check on things, will be very discouraging to new folks.
>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>> confidence are going to see reopening as "pestering" and busy folks
>>>> are going to see it as a clear indication that their work is not even
>>>> valuable enough for a human to give a reason for closing. In either
>>>> case, the cost of reopening is substantially higher than that button
>>>> press.
>>>>
>>>> How about we start by keeping a report of "

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
During the months of November / December, the 30 day period should be
relaxed.

Some people(at least in US) may take extended vacation during that time.

For Chinese developers, Spring Festival would bear similar circumstance.

On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote:

> I also think this might not have to be closed only because it is inactive.
>
>
> How about closing issues after 30 days when a committer's comment is added
> at the last without responses from the author?
>
>
> IMHO, If the committers are not sure whether the patch would be useful,
> then I think they should leave some comments why they are not sure, not
> just ignoring.
>
> Or, simply they could ask the author to prove that the patch is useful or
> safe with some references and tests.
>
>
> I think it might be nicer than that users are supposed to keep pinging.
> **Personally**, apparently, I am sometimes a bit worried if pinging
> multiple times can be a bit annoying.
>
>
>
> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:
>
>> It would be better to have a specific technical reason why this PR should
>> be closed, either the implementation is not good or the problem is not
>> valid, or something else. That will actually help the contributor to shape
>> their codes and reopen the PR again. Otherwise reasons like "feel free
>> to reopen for so-and-so reason" is actually discouraging and no difference
>> than directly close the PR.
>>
>> Just my two cents.
>>
>> Thanks
>> Jerry
>>
>>
>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote:
>>
>>> Having a PR closed, especially if due to committers not having hte
>>> bandwidth to check on things, will be very discouraging to new folks.
>>> Doubly so for those inexperienced with opensource. Even if the message
>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>> confidence are going to see reopening as "pestering" and busy folks
>>> are going to see it as a clear indication that their work is not even
>>> valuable enough for a human to give a reason for closing. In either
>>> case, the cost of reopening is substantially higher than that button
>>> press.
>>>
>>> How about we start by keeping a report of "at-risk" PRs that have been
>>> stale for 30 days to make it easier for committers to look at the prs
>>> that have been long inactive?
>>>
>>>
>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>> > The cost of "reopen" is close to zero, because it is just clicking a
>>> button.
>>> > I think you were referring to the cost of closing the pull request,
>>> and you
>>> > are assuming people look at the pull requests that have been inactive
>>> for a
>>> > long time. That seems equally likely (or unlikely) as committers
>>> looking at
>>> > the recently closed pull requests.
>>> >
>>> > In either case, most pull requests are scanned through by us when they
>>> are
>>> > first open, and if they are important enough, usually they get merged
>>> > quickly or a target version is set in JIRA. We can definitely improve
>>> that
>>> > by making it more explicit.
>>> >
>>> >
>>> >
>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> >>
>>> >> From committers' perspective, would they look at closed PRs ?
>>> >>
>>> >> If not, the cost is not close to zero.
>>> >> Meaning, some potentially useful PRs would never see the light of day.
>>> >>
>>> >> My two cents.
>>> >>
>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>> >>>
>>> >>> Part of it is how difficult it is to automate this. We can build a
>>> >>> perfect engine with a lot of rules that understand everything. But
>>> the more
>>> >>> complicated rules we need, the more unlikely for any of these to
>>> happen. So
>>> >>> I'd rather do this and create a nice enough message to tell
>>> contributors
>>> >>> sometimes mistake happen but the cost to reopen is approximately
>>> zero (i.e.
>>> >>> click a button on the pull request).
>>> >>>

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Hyukjin Kwon
I also think this might not have to be closed only because it is inactive.


How about closing issues after 30 days when a committer's comment is added
at the last without responses from the author?


IMHO, If the committers are not sure whether the patch would be useful,
then I think they should leave some comments why they are not sure, not
just ignoring.

Or, simply they could ask the author to prove that the patch is useful or
safe with some references and tests.


I think it might be nicer than that users are supposed to keep pinging.
**Personally**, apparently, I am sometimes a bit worried if pinging
multiple times can be a bit annoying.



2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:

> It would be better to have a specific technical reason why this PR should
> be closed, either the implementation is not good or the problem is not
> valid, or something else. That will actually help the contributor to shape
> their codes and reopen the PR again. Otherwise reasons like "feel free to
> reopen for so-and-so reason" is actually discouraging and no difference
> than directly close the PR.
>
> Just my two cents.
>
> Thanks
> Jerry
>
>
> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote:
>
>> Having a PR closed, especially if due to committers not having hte
>> bandwidth to check on things, will be very discouraging to new folks.
>> Doubly so for those inexperienced with opensource. Even if the message
>> says "feel free to reopen for so-and-so reason", new folks who lack
>> confidence are going to see reopening as "pestering" and busy folks
>> are going to see it as a clear indication that their work is not even
>> valuable enough for a human to give a reason for closing. In either
>> case, the cost of reopening is substantially higher than that button
>> press.
>>
>> How about we start by keeping a report of "at-risk" PRs that have been
>> stale for 30 days to make it easier for committers to look at the prs
>> that have been long inactive?
>>
>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote:
>> > The cost of "reopen" is close to zero, because it is just clicking a
>> button.
>> > I think you were referring to the cost of closing the pull request, and
>> you
>> > are assuming people look at the pull requests that have been inactive
>> for a
>> > long time. That seems equally likely (or unlikely) as committers
>> looking at
>> > the recently closed pull requests.
>> >
>> > In either case, most pull requests are scanned through by us when they
>> are
>> > first open, and if they are important enough, usually they get merged
>> > quickly or a target version is set in JIRA. We can definitely improve
>> that
>> > by making it more explicit.
>> >
>> >
>> >
>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> >>
>> >> From committers' perspective, would they look at closed PRs ?
>> >>
>> >> If not, the cost is not close to zero.
>> >> Meaning, some potentially useful PRs would never see the light of day.
>> >>
>> >> My two cents.
>> >>
>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>> wrote:
>> >>>
>> >>> Part of it is how difficult it is to automate this. We can build a
>> >>> perfect engine with a lot of rules that understand everything. But
>> the more
>> >>> complicated rules we need, the more unlikely for any of these to
>> happen. So
>> >>> I'd rather do this and create a nice enough message to tell
>> contributors
>> >>> sometimes mistake happen but the cost to reopen is approximately zero
>> (i.e.
>> >>> click a button on the pull request).
>> >>>
>> >>>
>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>> >>>>
>> >>>> bq. close the ones where they don't respond for a week
>> >>>>
>> >>>> Does this imply that the script understands response from human ?
>> >>>>
>> >>>> Meaning, would the script use some regex which signifies that the
>> >>>> contributor is willing to close the PR ?
>> >>>>
>> >>>> If the contributor is willing to close, why wouldn't he / she do it
>> >>>> him/herself ?
>> >>>>
>> &

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Saisai Shao
It would be better to have a specific technical reason why this PR should
be closed, either the implementation is not good or the problem is not
valid, or something else. That will actually help the contributor to shape
their codes and reopen the PR again. Otherwise reasons like "feel free to
reopen for so-and-so reason" is actually discouraging and no difference
than directly close the PR.

Just my two cents.

Thanks
Jerry


On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com> wrote:

> Having a PR closed, especially if due to committers not having hte
> bandwidth to check on things, will be very discouraging to new folks.
> Doubly so for those inexperienced with opensource. Even if the message
> says "feel free to reopen for so-and-so reason", new folks who lack
> confidence are going to see reopening as "pestering" and busy folks
> are going to see it as a clear indication that their work is not even
> valuable enough for a human to give a reason for closing. In either
> case, the cost of reopening is substantially higher than that button
> press.
>
> How about we start by keeping a report of "at-risk" PRs that have been
> stale for 30 days to make it easier for committers to look at the prs
> that have been long inactive?
>
> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote:
> > The cost of "reopen" is close to zero, because it is just clicking a
> button.
> > I think you were referring to the cost of closing the pull request, and
> you
> > are assuming people look at the pull requests that have been inactive
> for a
> > long time. That seems equally likely (or unlikely) as committers looking
> at
> > the recently closed pull requests.
> >
> > In either case, most pull requests are scanned through by us when they
> are
> > first open, and if they are important enough, usually they get merged
> > quickly or a target version is set in JIRA. We can definitely improve
> that
> > by making it more explicit.
> >
> >
> >
> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >>
> >> From committers' perspective, would they look at closed PRs ?
> >>
> >> If not, the cost is not close to zero.
> >> Meaning, some potentially useful PRs would never see the light of day.
> >>
> >> My two cents.
> >>
> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
> wrote:
> >>>
> >>> Part of it is how difficult it is to automate this. We can build a
> >>> perfect engine with a lot of rules that understand everything. But the
> more
> >>> complicated rules we need, the more unlikely for any of these to
> happen. So
> >>> I'd rather do this and create a nice enough message to tell
> contributors
> >>> sometimes mistake happen but the cost to reopen is approximately zero
> (i.e.
> >>> click a button on the pull request).
> >>>
> >>>
> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>
> >>>> bq. close the ones where they don't respond for a week
> >>>>
> >>>> Does this imply that the script understands response from human ?
> >>>>
> >>>> Meaning, would the script use some regex which signifies that the
> >>>> contributor is willing to close the PR ?
> >>>>
> >>>> If the contributor is willing to close, why wouldn't he / she do it
> >>>> him/herself ?
> >>>>
> >>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca>
> >>>> wrote:
> >>>>>
> >>>>> Personally I'd rather err on the side of keeping PRs open, but I
> >>>>> understand wanting to keep the open PRs limited to ones which have a
> >>>>> reasonable chance of being merged.
> >>>>>
> >>>>> What about if we filtered for non-mergeable PRs or instead left a
> >>>>> comment asking the author to respond if they are still available to
> move the
> >>>>> PR forward - and close the ones where they don't respond for a week?
> >>>>>
> >>>>> Just a suggestion.
> >>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>>>>
> >>>>>> I had one PR which got merged after 3 months.
> >>>>>>
> >>>>>> If the inactivity was due to co

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Sean Busbey
Having a PR closed, especially if due to committers not having hte
bandwidth to check on things, will be very discouraging to new folks.
Doubly so for those inexperienced with opensource. Even if the message
says "feel free to reopen for so-and-so reason", new folks who lack
confidence are going to see reopening as "pestering" and busy folks
are going to see it as a clear indication that their work is not even
valuable enough for a human to give a reason for closing. In either
case, the cost of reopening is substantially higher than that button
press.

How about we start by keeping a report of "at-risk" PRs that have been
stale for 30 days to make it easier for committers to look at the prs
that have been long inactive?

On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com> wrote:
> The cost of "reopen" is close to zero, because it is just clicking a button.
> I think you were referring to the cost of closing the pull request, and you
> are assuming people look at the pull requests that have been inactive for a
> long time. That seems equally likely (or unlikely) as committers looking at
> the recently closed pull requests.
>
> In either case, most pull requests are scanned through by us when they are
> first open, and if they are important enough, usually they get merged
> quickly or a target version is set in JIRA. We can definitely improve that
> by making it more explicit.
>
>
>
> On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> From committers' perspective, would they look at closed PRs ?
>>
>> If not, the cost is not close to zero.
>> Meaning, some potentially useful PRs would never see the light of day.
>>
>> My two cents.
>>
>> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote:
>>>
>>> Part of it is how difficult it is to automate this. We can build a
>>> perfect engine with a lot of rules that understand everything. But the more
>>> complicated rules we need, the more unlikely for any of these to happen. So
>>> I'd rather do this and create a nice enough message to tell contributors
>>> sometimes mistake happen but the cost to reopen is approximately zero (i.e.
>>> click a button on the pull request).
>>>
>>>
>>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>> bq. close the ones where they don't respond for a week
>>>>
>>>> Does this imply that the script understands response from human ?
>>>>
>>>> Meaning, would the script use some regex which signifies that the
>>>> contributor is willing to close the PR ?
>>>>
>>>> If the contributor is willing to close, why wouldn't he / she do it
>>>> him/herself ?
>>>>
>>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>>
>>>>> Personally I'd rather err on the side of keeping PRs open, but I
>>>>> understand wanting to keep the open PRs limited to ones which have a
>>>>> reasonable chance of being merged.
>>>>>
>>>>> What about if we filtered for non-mergeable PRs or instead left a
>>>>> comment asking the author to respond if they are still available to move 
>>>>> the
>>>>> PR forward - and close the ones where they don't respond for a week?
>>>>>
>>>>> Just a suggestion.
>>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>> I had one PR which got merged after 3 months.
>>>>>>
>>>>>> If the inactivity was due to contributor, I think it can be closed
>>>>>> after 30 days.
>>>>>> But if the inactivity was due to lack of review, the PR should be kept
>>>>>> open.
>>>>>>
>>>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> For what it's worth, I have definitely had PRs that sat inactive for
>>>>>>> more than 30 days due to committers not having time to look at them,
>>>>>>> but did eventually end up successfully being merged.
>>>>>>>
>>>>>>> I guess if this just ends up being a committer ping and reopening the
>>>>>>> PR, it's fine, but I don't know if it really addresses the underlying
>>>>>>> issue.
>>

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Reynold Xin
The cost of "reopen" is close to zero, because it is just clicking a
button. I think you were referring to the cost of closing the pull request,
and you are assuming people look at the pull requests that have been
inactive for a long time. That seems equally likely (or unlikely) as
committers looking at the recently closed pull requests.

In either case, most pull requests are scanned through by us when they are
first open, and if they are important enough, usually they get merged
quickly or a target version is set in JIRA. We can definitely improve that
by making it more explicit.



On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> From committers' perspective, would they look at closed PRs ?
>
> If not, the cost is not close to zero.
> Meaning, some potentially useful PRs would never see the light of day.
>
> My two cents.
>
> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Part of it is how difficult it is to automate this. We can build a
>> perfect engine with a lot of rules that understand everything. But the more
>> complicated rules we need, the more unlikely for any of these to happen. So
>> I'd rather do this and create a nice enough message to tell contributors
>> sometimes mistake happen but the cost to reopen is approximately zero (i.e.
>> click a button on the pull request).
>>
>>
>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> bq. close the ones where they don't respond for a week
>>>
>>> Does this imply that the script understands response from human ?
>>>
>>> Meaning, would the script use some regex which signifies that the
>>> contributor is willing to close the PR ?
>>>
>>> If the contributor is willing to close, why wouldn't he / she do it
>>> him/herself ?
>>>
>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Personally I'd rather err on the side of keeping PRs open, but I
>>>> understand wanting to keep the open PRs limited to ones which have a
>>>> reasonable chance of being merged.
>>>>
>>>> What about if we filtered for non-mergeable PRs or instead left a
>>>> comment asking the author to respond if they are still available to move
>>>> the PR forward - and close the ones where they don't respond for a week?
>>>>
>>>> Just a suggestion.
>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> I had one PR which got merged after 3 months.
>>>>>
>>>>> If the inactivity was due to contributor, I think it can be closed
>>>>> after 30 days.
>>>>> But if the inactivity was due to lack of review, the PR should be kept
>>>>> open.
>>>>>
>>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>>>>> wrote:
>>>>>
>>>>>> For what it's worth, I have definitely had PRs that sat inactive for
>>>>>> more than 30 days due to committers not having time to look at them,
>>>>>> but did eventually end up successfully being merged.
>>>>>>
>>>>>> I guess if this just ends up being a committer ping and reopening the
>>>>>> PR, it's fine, but I don't know if it really addresses the underlying
>>>>>> issue.
>>>>>>
>>>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com>
>>>>>> wrote:
>>>>>> > We have hit a new high in open pull requests: 469 today. While we
>>>>>> can
>>>>>> > certainly get more review bandwidth, many of these are old and
>>>>>> still open
>>>>>> > for other reasons. Some are stale because the original authors have
>>>>>> become
>>>>>> > busy and inactive, and some others are stale because the committers
>>>>>> are not
>>>>>> > sure whether the patch would be useful, but have not rejected the
>>>>>> patch
>>>>>> > explicitly. We can cut down the signal to noise ratio by closing
>>>>>> pull
>>>>>> > requests that have been inactive for greater than 30 days, with a
>>>>>> nice
>>>>>> > message. I just checked and this would close ~ half of the pull
>>>>>> requests.
>>>>>> >
>>>>>> > For example:
>>>>>> >
>>>>>> > "Thank you for creating this pull request. Since this pull request
>>>>>> has been
>>>>>> > inactive for 30 days, we are automatically closing it. Closing the
>>>>>> pull
>>>>>> > request does not remove it from history and will retain all the
>>>>>> diff and
>>>>>> > review comments. If you have the bandwidth and would like to
>>>>>> continue
>>>>>> > pushing this forward, please reopen it. Thanks again!"
>>>>>> >
>>>>>> >
>>>>>>
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Cell : 425-233-8271
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>>
>>>
>>
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
>From committers' perspective, would they look at closed PRs ?

If not, the cost is not close to zero.
Meaning, some potentially useful PRs would never see the light of day.

My two cents.

On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com> wrote:

> Part of it is how difficult it is to automate this. We can build a perfect
> engine with a lot of rules that understand everything. But the more
> complicated rules we need, the more unlikely for any of these to happen. So
> I'd rather do this and create a nice enough message to tell contributors
> sometimes mistake happen but the cost to reopen is approximately zero (i.e.
> click a button on the pull request).
>
>
> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> bq. close the ones where they don't respond for a week
>>
>> Does this imply that the script understands response from human ?
>>
>> Meaning, would the script use some regex which signifies that the
>> contributor is willing to close the PR ?
>>
>> If the contributor is willing to close, why wouldn't he / she do it
>> him/herself ?
>>
>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Personally I'd rather err on the side of keeping PRs open, but I
>>> understand wanting to keep the open PRs limited to ones which have a
>>> reasonable chance of being merged.
>>>
>>> What about if we filtered for non-mergeable PRs or instead left a
>>> comment asking the author to respond if they are still available to move
>>> the PR forward - and close the ones where they don't respond for a week?
>>>
>>> Just a suggestion.
>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> I had one PR which got merged after 3 months.
>>>>
>>>> If the inactivity was due to contributor, I think it can be closed
>>>> after 30 days.
>>>> But if the inactivity was due to lack of review, the PR should be kept
>>>> open.
>>>>
>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>>>> wrote:
>>>>
>>>>> For what it's worth, I have definitely had PRs that sat inactive for
>>>>> more than 30 days due to committers not having time to look at them,
>>>>> but did eventually end up successfully being merged.
>>>>>
>>>>> I guess if this just ends up being a committer ping and reopening the
>>>>> PR, it's fine, but I don't know if it really addresses the underlying
>>>>> issue.
>>>>>
>>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> > We have hit a new high in open pull requests: 469 today. While we can
>>>>> > certainly get more review bandwidth, many of these are old and still
>>>>> open
>>>>> > for other reasons. Some are stale because the original authors have
>>>>> become
>>>>> > busy and inactive, and some others are stale because the committers
>>>>> are not
>>>>> > sure whether the patch would be useful, but have not rejected the
>>>>> patch
>>>>> > explicitly. We can cut down the signal to noise ratio by closing pull
>>>>> > requests that have been inactive for greater than 30 days, with a
>>>>> nice
>>>>> > message. I just checked and this would close ~ half of the pull
>>>>> requests.
>>>>> >
>>>>> > For example:
>>>>> >
>>>>> > "Thank you for creating this pull request. Since this pull request
>>>>> has been
>>>>> > inactive for 30 days, we are automatically closing it. Closing the
>>>>> pull
>>>>> > request does not remove it from history and will retain all the diff
>>>>> and
>>>>> > review comments. If you have the bandwidth and would like to continue
>>>>> > pushing this forward, please reopen it. Thanks again!"
>>>>> >
>>>>> >
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>>
>>
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Reynold Xin
Part of it is how difficult it is to automate this. We can build a perfect
engine with a lot of rules that understand everything. But the more
complicated rules we need, the more unlikely for any of these to happen. So
I'd rather do this and create a nice enough message to tell contributors
sometimes mistake happen but the cost to reopen is approximately zero (i.e.
click a button on the pull request).


On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> bq. close the ones where they don't respond for a week
>
> Does this imply that the script understands response from human ?
>
> Meaning, would the script use some regex which signifies that the
> contributor is willing to close the PR ?
>
> If the contributor is willing to close, why wouldn't he / she do it
> him/herself ?
>
> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> Personally I'd rather err on the side of keeping PRs open, but I
>> understand wanting to keep the open PRs limited to ones which have a
>> reasonable chance of being merged.
>>
>> What about if we filtered for non-mergeable PRs or instead left a comment
>> asking the author to respond if they are still available to move the PR
>> forward - and close the ones where they don't respond for a week?
>>
>> Just a suggestion.
>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> I had one PR which got merged after 3 months.
>>>
>>> If the inactivity was due to contributor, I think it can be closed after
>>> 30 days.
>>> But if the inactivity was due to lack of review, the PR should be kept
>>> open.
>>>
>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> For what it's worth, I have definitely had PRs that sat inactive for
>>>> more than 30 days due to committers not having time to look at them,
>>>> but did eventually end up successfully being merged.
>>>>
>>>> I guess if this just ends up being a committer ping and reopening the
>>>> PR, it's fine, but I don't know if it really addresses the underlying
>>>> issue.
>>>>
>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com>
>>>> wrote:
>>>> > We have hit a new high in open pull requests: 469 today. While we can
>>>> > certainly get more review bandwidth, many of these are old and still
>>>> open
>>>> > for other reasons. Some are stale because the original authors have
>>>> become
>>>> > busy and inactive, and some others are stale because the committers
>>>> are not
>>>> > sure whether the patch would be useful, but have not rejected the
>>>> patch
>>>> > explicitly. We can cut down the signal to noise ratio by closing pull
>>>> > requests that have been inactive for greater than 30 days, with a nice
>>>> > message. I just checked and this would close ~ half of the pull
>>>> requests.
>>>> >
>>>> > For example:
>>>> >
>>>> > "Thank you for creating this pull request. Since this pull request
>>>> has been
>>>> > inactive for 30 days, we are automatically closing it. Closing the
>>>> pull
>>>> > request does not remove it from history and will retain all the diff
>>>> and
>>>> > review comments. If you have the bandwidth and would like to continue
>>>> > pushing this forward, please reopen it. Thanks again!"
>>>> >
>>>> >
>>>>
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>>
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
bq. close the ones where they don't respond for a week

Does this imply that the script understands response from human ?

Meaning, would the script use some regex which signifies that the
contributor is willing to close the PR ?

If the contributor is willing to close, why wouldn't he / she do it
him/herself ?

On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> Personally I'd rather err on the side of keeping PRs open, but I
> understand wanting to keep the open PRs limited to ones which have a
> reasonable chance of being merged.
>
> What about if we filtered for non-mergeable PRs or instead left a comment
> asking the author to respond if they are still available to move the PR
> forward - and close the ones where they don't respond for a week?
>
> Just a suggestion.
> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> I had one PR which got merged after 3 months.
>>
>> If the inactivity was due to contributor, I think it can be closed after
>> 30 days.
>> But if the inactivity was due to lack of review, the PR should be kept
>> open.
>>
>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>
>>> For what it's worth, I have definitely had PRs that sat inactive for
>>> more than 30 days due to committers not having time to look at them,
>>> but did eventually end up successfully being merged.
>>>
>>> I guess if this just ends up being a committer ping and reopening the
>>> PR, it's fine, but I don't know if it really addresses the underlying
>>> issue.
>>>
>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>> > We have hit a new high in open pull requests: 469 today. While we can
>>> > certainly get more review bandwidth, many of these are old and still
>>> open
>>> > for other reasons. Some are stale because the original authors have
>>> become
>>> > busy and inactive, and some others are stale because the committers
>>> are not
>>> > sure whether the patch would be useful, but have not rejected the patch
>>> > explicitly. We can cut down the signal to noise ratio by closing pull
>>> > requests that have been inactive for greater than 30 days, with a nice
>>> > message. I just checked and this would close ~ half of the pull
>>> requests.
>>> >
>>> > For example:
>>> >
>>> > "Thank you for creating this pull request. Since this pull request has
>>> been
>>> > inactive for 30 days, we are automatically closing it. Closing the pull
>>> > request does not remove it from history and will retain all the diff
>>> and
>>> > review comments. If you have the bandwidth and would like to continue
>>> > pushing this forward, please reopen it. Thanks again!"
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Marcin Tustin
+1 and at the same time maybe surface a report to this list of PRs which
need committer action and have only had submitters responding to pings in
the last 30 days?

On Mon, Apr 18, 2016 at 3:33 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> Personally I'd rather err on the side of keeping PRs open, but I
> understand wanting to keep the open PRs limited to ones which have a
> reasonable chance of being merged.
>
> What about if we filtered for non-mergeable PRs or instead left a comment
> asking the author to respond if they are still available to move the PR
> forward - and close the ones where they don't respond for a week?
>
> Just a suggestion.
>
> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> I had one PR which got merged after 3 months.
>>
>> If the inactivity was due to contributor, I think it can be closed after
>> 30 days.
>> But if the inactivity was due to lack of review, the PR should be kept
>> open.
>>
>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>
>>> For what it's worth, I have definitely had PRs that sat inactive for
>>> more than 30 days due to committers not having time to look at them,
>>> but did eventually end up successfully being merged.
>>>
>>> I guess if this just ends up being a committer ping and reopening the
>>> PR, it's fine, but I don't know if it really addresses the underlying
>>> issue.
>>>
>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>> > We have hit a new high in open pull requests: 469 today. While we can
>>> > certainly get more review bandwidth, many of these are old and still
>>> open
>>> > for other reasons. Some are stale because the original authors have
>>> become
>>> > busy and inactive, and some others are stale because the committers
>>> are not
>>> > sure whether the patch would be useful, but have not rejected the patch
>>> > explicitly. We can cut down the signal to noise ratio by closing pull
>>> > requests that have been inactive for greater than 30 days, with a nice
>>> > message. I just checked and this would close ~ half of the pull
>>> requests.
>>> >
>>> > For example:
>>> >
>>> > "Thank you for creating this pull request. Since this pull request has
>>> been
>>> > inactive for 30 days, we are automatically closing it. Closing the pull
>>> > request does not remove it from history and will retain all the diff
>>> and
>>> > review comments. If you have the bandwidth and would like to continue
>>> > pushing this forward, please reopen it. Thanks again!"
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity



Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Holden Karau
Personally I'd rather err on the side of keeping PRs open, but I understand
wanting to keep the open PRs limited to ones which have a reasonable chance
of being merged.

What about if we filtered for non-mergeable PRs or instead left a comment
asking the author to respond if they are still available to move the PR
forward - and close the ones where they don't respond for a week?

Just a suggestion.
On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:

> I had one PR which got merged after 3 months.
>
> If the inactivity was due to contributor, I think it can be closed after
> 30 days.
> But if the inactivity was due to lack of review, the PR should be kept
> open.
>
> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org
> <javascript:_e(%7B%7D,'cvml','c...@koeninger.org');>> wrote:
>
>> For what it's worth, I have definitely had PRs that sat inactive for
>> more than 30 days due to committers not having time to look at them,
>> but did eventually end up successfully being merged.
>>
>> I guess if this just ends up being a committer ping and reopening the
>> PR, it's fine, but I don't know if it really addresses the underlying
>> issue.
>>
>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com
>> <javascript:_e(%7B%7D,'cvml','r...@databricks.com');>> wrote:
>> > We have hit a new high in open pull requests: 469 today. While we can
>> > certainly get more review bandwidth, many of these are old and still
>> open
>> > for other reasons. Some are stale because the original authors have
>> become
>> > busy and inactive, and some others are stale because the committers are
>> not
>> > sure whether the patch would be useful, but have not rejected the patch
>> > explicitly. We can cut down the signal to noise ratio by closing pull
>> > requests that have been inactive for greater than 30 days, with a nice
>> > message. I just checked and this would close ~ half of the pull
>> requests.
>> >
>> > For example:
>> >
>> > "Thank you for creating this pull request. Since this pull request has
>> been
>> > inactive for 30 days, we are automatically closing it. Closing the pull
>> > request does not remove it from history and will retain all the diff and
>> > review comments. If you have the bandwidth and would like to continue
>> > pushing this forward, please reopen it. Thanks again!"
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev-unsubscr...@spark.apache.org');>
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev-h...@spark.apache.org');>
>>
>>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Reynold Xin
Cody,

Thanks for commenting. "inactive" here means no code push nor comments. So
any "ping" would actually keep the pr in the open queue. Getting
auto-closed also by no means indicate the pull request can't be reopened.

On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> wrote:

> For what it's worth, I have definitely had PRs that sat inactive for
> more than 30 days due to committers not having time to look at them,
> but did eventually end up successfully being merged.
>
> I guess if this just ends up being a committer ping and reopening the
> PR, it's fine, but I don't know if it really addresses the underlying
> issue.
>
> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote:
> > We have hit a new high in open pull requests: 469 today. While we can
> > certainly get more review bandwidth, many of these are old and still open
> > for other reasons. Some are stale because the original authors have
> become
> > busy and inactive, and some others are stale because the committers are
> not
> > sure whether the patch would be useful, but have not rejected the patch
> > explicitly. We can cut down the signal to noise ratio by closing pull
> > requests that have been inactive for greater than 30 days, with a nice
> > message. I just checked and this would close ~ half of the pull requests.
> >
> > For example:
> >
> > "Thank you for creating this pull request. Since this pull request has
> been
> > inactive for 30 days, we are automatically closing it. Closing the pull
> > request does not remove it from history and will retain all the diff and
> > review comments. If you have the bandwidth and would like to continue
> > pushing this forward, please reopen it. Thanks again!"
> >
> >
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
I had one PR which got merged after 3 months.

If the inactivity was due to contributor, I think it can be closed after 30
days.
But if the inactivity was due to lack of review, the PR should be kept open.

On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> wrote:

> For what it's worth, I have definitely had PRs that sat inactive for
> more than 30 days due to committers not having time to look at them,
> but did eventually end up successfully being merged.
>
> I guess if this just ends up being a committer ping and reopening the
> PR, it's fine, but I don't know if it really addresses the underlying
> issue.
>
> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote:
> > We have hit a new high in open pull requests: 469 today. While we can
> > certainly get more review bandwidth, many of these are old and still open
> > for other reasons. Some are stale because the original authors have
> become
> > busy and inactive, and some others are stale because the committers are
> not
> > sure whether the patch would be useful, but have not rejected the patch
> > explicitly. We can cut down the signal to noise ratio by closing pull
> > requests that have been inactive for greater than 30 days, with a nice
> > message. I just checked and this would close ~ half of the pull requests.
> >
> > For example:
> >
> > "Thank you for creating this pull request. Since this pull request has
> been
> > inactive for 30 days, we are automatically closing it. Closing the pull
> > request does not remove it from history and will retain all the diff and
> > review comments. If you have the bandwidth and would like to continue
> > pushing this forward, please reopen it. Thanks again!"
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Cody Koeninger
For what it's worth, I have definitely had PRs that sat inactive for
more than 30 days due to committers not having time to look at them,
but did eventually end up successfully being merged.

I guess if this just ends up being a committer ping and reopening the
PR, it's fine, but I don't know if it really addresses the underlying
issue.

On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <r...@databricks.com> wrote:
> We have hit a new high in open pull requests: 469 today. While we can
> certainly get more review bandwidth, many of these are old and still open
> for other reasons. Some are stale because the original authors have become
> busy and inactive, and some others are stale because the committers are not
> sure whether the patch would be useful, but have not rejected the patch
> explicitly. We can cut down the signal to noise ratio by closing pull
> requests that have been inactive for greater than 30 days, with a nice
> message. I just checked and this would close ~ half of the pull requests.
>
> For example:
>
> "Thank you for creating this pull request. Since this pull request has been
> inactive for 30 days, we are automatically closing it. Closing the pull
> request does not remove it from history and will retain all the diff and
> review comments. If you have the bandwidth and would like to continue
> pushing this forward, please reopen it. Thanks again!"
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Reynold Xin
We have hit a new high in open pull requests: 469 today. While we can
certainly get more review bandwidth, many of these are old and still open
for other reasons. Some are stale because the original authors have become
busy and inactive, and some others are stale because the committers are not
sure whether the patch would be useful, but have not rejected the patch
explicitly. We can cut down the signal to noise ratio by closing pull
requests that have been inactive for greater than 30 days, with a nice
message. I just checked and this would close ~ half of the pull requests.

For example:

"Thank you for creating this pull request. Since this pull request has been
inactive for 30 days, we are automatically closing it. Closing the pull
request does not remove it from history and will retain all the diff and
review comments. If you have the bandwidth and would like to continue
pushing this forward, please reopen it. Thanks again!"


[ANNOUNCE] New testing capabilities for pull requests

2015-08-30 Thread Patrick Wendell
Hi All,

For pull requests that modify the build, you can now test different
build permutations as part of the pull request builder. To trigger
these, you add a special phrase to the title of the pull request.
Current options are:

[test-maven] - run tests using maven and not sbt
[test-hadoop1.0] - test using older hadoop versions (can use 1.0, 2.0,
2.2, and 2.3).

The relevant source code is here:
https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L193

This is useful because it allows up-front testing of build changes to
avoid breaks once a patch has already been merged.

I've documented this on the wiki:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Fwd: pull requests no longer closing by commit messages with closes #xxxx

2015-06-08 Thread Reynold Xin
FYI.

-- Forwarded message --
From: John Greet (GitHub Staff) supp...@github.com
Date: Mon, Jun 8, 2015 at 5:50 PM
Subject: Re: pull requests no longer closing by commit messages with
closes #
To: Reynold Xin r...@databricks.com


Hi Reynold,

The problem here is that the commits closing those pull requests were
fetched by our mirroring process, which doesn't have permission to close
issues, instead of pushed by a user in the apache GitHub organization.

Usually the repository receives regular, I assume automated, pushes to its
master branch, but there was a gap in those pushes between 2pm PDT on June
5th and 1:16 PM PDT June 7th. This happened at least once before back in
November. Now that those pushes have resumed pull requests are closing
normally once again.

Let us know if you have any other questions.

Cheers,
John



 I'm a committer on Apache Spark (the most active open source project in
the data space). We use GitHub as the primary way to accept contributions.
We use a custom merge script to merge pull requests rather than GitHub's
merge button in order to preserve a linear commit history. Part of the
merge script relies on the closes # feature to close the
corresponding pull requests.

 I noticed recently that pull requests are no longer automatically closed,
even if the commits are merged with the message closes #. Here are
two recent examples:

 https://github.com/apache/spark/pull/6670
 https://github.com/apache/spark/pull/6689

 Can you take a look at what's going on? Thanks.


Re: Pull Requests on github

2015-02-09 Thread fommil
Cool, thanks! Let me know if there are any more core numerical libraries
that you'd like to see to support Spark with optimised natives using a
similar packaging model at netlib-java.

I'm interested in fast random number generation next, and I keep wondering
if anybody would be interested in paying for FPGA or GPU / APU backends for
netlib-java. It would be a *lot* of work but I'd be very interested to talk
to an organisation with such a requirement and I'd be able to do it in less
time than they would internally.
On 10 Feb 2015 04:12, Andrew Ash [via Apache Spark Developers List] 
ml-node+s1001551n10546...@n3.nabble.com wrote:

 Sam, I see your PR was merged -- many thanks for sending it in and getting
 it merged!

 In general for future reference, the most effective way to contribute is
 outlined on this wiki page:
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 On Mon, Feb 9, 2015 at 1:04 AM, Akhil Das [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=0
 wrote:

  You can open a Jira issue pointing this PR to get it processed faster.
 :)
 
  Thanks
  Best Regards
 
  On Sat, Feb 7, 2015 at 7:07 AM, fommil [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=1 wrote:
 
   Hi all,
  
   I'm the author of netlib-java and I noticed that the documentation in
  MLlib
   was out of date and misleading, so I submitted a pull request on
 github
   which will hopefully make things easier for everybody to understand
 the
   benefits of system optimised natives and how to use them :-)
  
 https://github.com/apache/spark/pull/4448
  
   However, it looks like there are a *lot* of outstanding PRs and that
 this
   is
   just a mirror repository.
  
   Will somebody please look at my PR and merge into the canonical source
  (and
   let me know)?
  
   Best regards,
   Sam
  
  
  
   --
   View this message in context:
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
   Sent from the Apache Spark Developers List mailing list archive at
   Nabble.com.
  
   -
   To unsubscribe, e-mail: [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=2
   For additional commands, e-mail: [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=3
  
  
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10546.html
  To unsubscribe from Pull Requests on github, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=10502code=c2FtLmhhbGxpZGF5QGdtYWlsLmNvbXwxMDUwMnwtMzI4MzQzMDI0
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10558.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Pull Requests on github

2015-02-08 Thread fommil
Hi all,

I'm the author of netlib-java and I noticed that the documentation in MLlib
was out of date and misleading, so I submitted a pull request on github
which will hopefully make things easier for everybody to understand the
benefits of system optimised natives and how to use them :-)

  https://github.com/apache/spark/pull/4448

However, it looks like there are a *lot* of outstanding PRs and that this is
just a mirror repository.

Will somebody please look at my PR and merge into the canonical source (and
let me know)?

Best regards,
Sam



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Pull Requests on github

2015-02-08 Thread Akhil Das
You can open a Jira issue pointing this PR to get it processed faster. :)

Thanks
Best Regards

On Sat, Feb 7, 2015 at 7:07 AM, fommil sam.halli...@gmail.com wrote:

 Hi all,

 I'm the author of netlib-java and I noticed that the documentation in MLlib
 was out of date and misleading, so I submitted a pull request on github
 which will hopefully make things easier for everybody to understand the
 benefits of system optimised natives and how to use them :-)

   https://github.com/apache/spark/pull/4448

 However, it looks like there are a *lot* of outstanding PRs and that this
 is
 just a mirror repository.

 Will somebody please look at my PR and merge into the canonical source (and
 let me know)?

 Best regards,
 Sam



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Pull Requests

2014-10-06 Thread Bill Bejeck
Once a PR has been tested and verified, when does it get pulled back into
the trunk?


Re: Pull Requests

2014-10-06 Thread Bill Bejeck
Can someone review patch #2309 (jira task SPARK-3178)

Thanks

On Mon, Oct 6, 2014 at 10:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Bill,

 Automated testing is just one small part of the process that performs
 basic sanity checks on code. All patches need to be championed and
 merged by a committer to make it into Spark. For large patches we also
 ask users to propose a design before sending a patch.

 This is discussed in our contributing page:
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 If there is a patch that you are waiting for feedback on feel free to
 just ping this list with the patch number. Is there one you are
 waiting for feedback on?

 - Patrick

 On Mon, Oct 6, 2014 at 7:32 PM, Bill Bejeck bbej...@gmail.com wrote:
  Once a PR has been tested and verified, when does it get pulled back into
  the trunk?



Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Nicholas Chammas
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but
it appears to have been provided to them somehow by ASF.

See this comment as an example:
https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078

Might be a small win to push this work to a bot ASF manages if we can get
access to it (and if we have no concerns about depending on an another
external service).

Nick


On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 Thanks for looking into this. I think little tools like this are super
 helpful.

 Would it hurt to open a request with INFRA to install/configure the
 JIRA-GitHub plugin while we continue to use the Python script we have? I
 wouldn't mind opening that JIRA issue with them.

 Nick


 On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 I spent some time on this and I'm not sure either of these is an option,
 unfortunately.

 We typically can't use custom JIRA plug-in's because this JIRA is
 controlled by the ASF and we don't have rights to modify most things about
 how it works (it's a large shared JIRA instance used by more than 50
 projects). It's worth looking into whether they can do something. In
 general we've tended to avoid going through ASF infra them whenever
 possible, since they are generally overloaded and things move very slowly,
 even if there are outages.

 Here is the script we use to do the sync:
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

 It might be possible to modify this to support post-hoc changes, but we'd
 need to think about how to do so while minimizing function calls to the ASF
 JIRA API, which I found are very slow.

 - Patrick



 On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then*
 have

 the JIRA issue ID added to the name. Would it be easy to somehow have the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there’s a JIRA plugin that integrates it with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin

 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 

  that might be nice to have for heavy JIRA users.
 
  Nick
 
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Just a small note, today I committed a tool that will automatically
   mirror pull requests to JIRA issues, so contributors will no longer
   have to manually post a pull request on the JIRA when they make
 one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 






Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
Hey Nicholas,

That seems promising - I prefer having a proper link to having that fairly
verbose comment though, because in some cases there will be dozens of
comments and it could get lost. I wonder if they could do something where
it posts a link instead...

- Patrick


On Mon, Aug 25, 2014 at 11:06 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 FYI: Looks like the Mesos folk also have a bot to do automatic linking,
 but it appears to have been provided to them somehow by ASF.

 See this comment as an example:
 https://issues.apache.org/jira/browse/MESOS-1688?focusedCommentId=14109078page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14109078

 Might be a small win to push this work to a bot ASF manages if we can get
 access to it (and if we have no concerns about depending on an another
 external service).

 Nick


 On Mon, Aug 11, 2014 at 4:10 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Thanks for looking into this. I think little tools like this are super
 helpful.

 Would it hurt to open a request with INFRA to install/configure the
 JIRA-GitHub plugin while we continue to use the Python script we have? I
 wouldn't mind opening that JIRA issue with them.

 Nick


 On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 I spent some time on this and I'm not sure either of these is an option,
 unfortunately.

 We typically can't use custom JIRA plug-in's because this JIRA is
 controlled by the ASF and we don't have rights to modify most things about
 how it works (it's a large shared JIRA instance used by more than 50
 projects). It's worth looking into whether they can do something. In
 general we've tended to avoid going through ASF infra them whenever
 possible, since they are generally overloaded and things move very slowly,
 even if there are outages.

 Here is the script we use to do the sync:
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

 It might be possible to modify this to support post-hoc changes, but
 we'd need to think about how to do so while minimizing function calls to
 the ASF JIRA API, which I found are very slow.

 - Patrick



 On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then*
 have

 the JIRA issue ID added to the name. Would it be easy to somehow have
 the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there's a JIRA plugin that integrates it
 with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin

 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 

  that might be nice to have for heavy JIRA users.
 
  Nick
 
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
 
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Just a small note, today I committed a tool that will
 automatically
   mirror pull requests to JIRA issues, so contributors will no
 longer
   have to manually post a pull request on the JIRA when they make
 one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 







Re: -1s on pull requests?

2014-08-15 Thread Nicholas Chammas
On Sun, Aug 3, 2014 at 4:35 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Include the commit hash in the tests have started/completed messages, so
 that it's clear what code exactly is/has been tested for each test cycle.


This is now captured in this JIRA issue
https://issues.apache.org/jira/browse/SPARK-2912 and completed in this PR
https://github.com/apache/spark/pull/1816 which has been merged in to
master.

Example of old style: tests starting
https://github.com/apache/spark/pull/1819#issuecomment-51416510 / tests
finished https://github.com/apache/spark/pull/1819#issuecomment-51417477
(with
new classes)

Example of new style: tests starting
https://github.com/apache/spark/pull/1816#issuecomment-51855254 / tests
finished https://github.com/apache/spark/pull/1816#issuecomment-51855255
(with
new classes)

Nick


Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-11 Thread Patrick Wendell
I spent some time on this and I'm not sure either of these is an option,
unfortunately.

We typically can't use custom JIRA plug-in's because this JIRA is
controlled by the ASF and we don't have rights to modify most things about
how it works (it's a large shared JIRA instance used by more than 50
projects). It's worth looking into whether they can do something. In
general we've tended to avoid going through ASF infra them whenever
possible, since they are generally overloaded and things move very slowly,
even if there are outages.

Here is the script we use to do the sync:
https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

It might be possible to modify this to support post-hoc changes, but we'd
need to think about how to do so while minimizing function calls to the ASF
JIRA API, which I found are very slow.

- Patrick



On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then* have
 the JIRA issue ID added to the name. Would it be easy to somehow have the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there's a JIRA plugin that integrates it with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin
 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 
  that might be nice to have for heavy JIRA users.
 
  Nick
  
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
   Just a small note, today I committed a tool that will automatically
   mirror pull requests to JIRA issues, so contributors will no longer
   have to manually post a pull request on the JIRA when they make one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 



Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-11 Thread Nicholas Chammas
Thanks for looking into this. I think little tools like this are super
helpful.

Would it hurt to open a request with INFRA to install/configure the
JIRA-GitHub plugin while we continue to use the Python script we have? I
wouldn't mind opening that JIRA issue with them.

Nick


On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com
wrote:

 I spent some time on this and I'm not sure either of these is an option,
 unfortunately.

 We typically can't use custom JIRA plug-in's because this JIRA is
 controlled by the ASF and we don't have rights to modify most things about
 how it works (it's a large shared JIRA instance used by more than 50
 projects). It's worth looking into whether they can do something. In
 general we've tended to avoid going through ASF infra them whenever
 possible, since they are generally overloaded and things move very slowly,
 even if there are outages.

 Here is the script we use to do the sync:
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

 It might be possible to modify this to support post-hoc changes, but we'd
 need to think about how to do so while minimizing function calls to the ASF
 JIRA API, which I found are very slow.

 - Patrick



 On Mon, Aug 11, 2014 at 7:51 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 It looks like this script doesn't catch PRs that are opened and *then*
 have

 the JIRA issue ID added to the name. Would it be easy to somehow have the
 script trigger on PR name changes as well as PR creates?

 Alternately, is there a reason we can't or don't want to use the plugin
 mentioned below? (I'm assuming it covers cases like this, but I'm not
 sure.)

 Nick



 On Wed, Jul 23, 2014 at 12:52 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

  By the way, it looks like there’s a JIRA plugin that integrates it with
  GitHub:
 
 -
 
 https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin

 -
 
 https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA
 
  It does the automatic linking and shows some additional information
  
 https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
 

  that might be nice to have for heavy JIRA users.
 
  Nick
 
 
 
  On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Yeah it needs to have SPARK-XXX in the title (this is the format we
  request already). It just works with small synchronization script I
  wrote that we run every five minutes on Jeknins that uses the Github
  and Jenkins API:
 
 
 
 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929
 
  - Patrick
 
  On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   That's pretty neat.
  
   How does it work? Do we just need to put the issue ID (e.g.
 SPARK-1234)
   anywhere in the pull request?
  
   Nick
  
  
   On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Just a small note, today I committed a tool that will automatically
   mirror pull requests to JIRA issues, so contributors will no longer
   have to manually post a pull request on the JIRA when they make one.
  
   It will create a link on the JIRA and also make a comment to
 trigger
   an e-mail to people watching.
  
   This should make some things easier, such as avoiding accidental
   duplicate effort on the same JIRA.
  
   - Patrick
  
 
 
 





Re: -1s on pull requests?

2014-08-05 Thread Mridul Muralidharan
Just came across this mail, thanks for initiating this discussion Kay.
To add; another issue which recurs is very rapid commit's: before most
contributors have had a chance to even look at the changes proposed.
There is not much prior discussion on the jira or pr, and the time
between submitting the PR and committing it is  12 hours.

Particularly relevant when contributors are not on US timezones and/or
colocated; I have raised this a few times before when the commit had
other side effects not considered.
On flip side we have PR's which have been languishing for weeks with
little or no activity from committers side - making the contribution
stale; so too long a delay is also definitely not the direction to
take either !



Regards,
Mridul



On Tue, Jul 22, 2014 at 2:14 AM, Kay Ousterhout k...@eecs.berkeley.edu wrote:
 Hi all,

 As the number of committers / contributors on Spark has increased, there
 are cases where pull requests get merged before all the review comments
 have been addressed. This happens say when one committer points out a
 problem with the pull request, and another committer doesn't see the
 earlier comment and merges the PR before the comment has been addressed.
  This is especially tricky for pull requests with a large number of
 comments, because it can be difficult to notice early comments describing
 blocking issues.

 This also happens when something accidentally gets merged after the tests
 have started but before tests have passed.

 Do folks have ideas on how we can handle this issue? Are there other
 projects that have good ways of handling this? It looks like for Hadoop,
 people can -1 / +1 on the JIRA.

 -Kay

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: -1s on pull requests?

2014-08-05 Thread Nicholas Chammas

 1. Include the commit hash in the tests have started/completed


FYI: Looks like Xiangrui's already got a JIRA issue for this.

SPARK-2622: Add Jenkins build numbers to SparkQA messages
https://issues.apache.org/jira/browse/SPARK-2622

2. Pin a message to the start or end of the PR


Should new JIRA issues for this item fall under the following umbrella
issue?

SPARK-2230: Improvements to Jenkins QA Harness
https://issues.apache.org/jira/browse/SPARK-2230

Nick


Re: -1s on pull requests?

2014-08-05 Thread Xiangrui Meng
I think the build number is included in the SparkQA message, for
example: https://github.com/apache/spark/pull/1788

The build number 17941 is in the URL
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17941/consoleFull;.
Just need to be careful to match the number.

Another solution is to kill running Jenkins jobs if there is a code change.

On Tue, Aug 5, 2014 at 8:48 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

 1. Include the commit hash in the tests have started/completed


 FYI: Looks like Xiangrui's already got a JIRA issue for this.

 SPARK-2622: Add Jenkins build numbers to SparkQA messages
 https://issues.apache.org/jira/browse/SPARK-2622

 2. Pin a message to the start or end of the PR


 Should new JIRA issues for this item fall under the following umbrella
 issue?

 SPARK-2230: Improvements to Jenkins QA Harness
 https://issues.apache.org/jira/browse/SPARK-2230

 Nick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: -1s on pull requests?

2014-08-03 Thread Nicholas Chammas
On Mon, Jul 21, 2014 at 4:44 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:

 This also happens when something accidentally gets merged after the tests
 have started but before tests have passed.


Some improvements to SparkQA https://github.com/SparkQA could help with
this. May I suggest:

   1. Include the commit hash in the tests have started/completed
   messages, so that it's clear what code exactly is/has been tested for each
   test cycle.
   2. Pin a message to the start or end of the PR that is updated with
   the status of the PR. Testing not complete; New commits since last
   test; Tests failed; etc. It should be easy for committers to get the
   status of the PR at a glance, without scrolling through the comment history.

Nick


Re: -1s on pull requests?

2014-08-03 Thread Patrick Wendell

1. Include the commit hash in the tests have started/completed
messages, so that it's clear what code exactly is/has been tested for
 each
test cycle.


Great idea - I think this is easy to do given the current architecture. We
already have access to the commit ID in the same script that posts the
comments.

   2. Pin a message to the start or end of the PR that is updated with
the status of the PR. Testing not complete; New commits since last
test; Tests failed; etc. It should be easy for committers to get the
status of the PR at a glance, without scrolling through the comment
 history.


This also is a good idea - I think this would be doable since the github
API allows us to edit comments, but it's a bit tricker. I think it would
require first making an API call to get the status comment ID and then
updating it.



 Nick


Nick - Any interest in doing these? this is all doable from within the
spark repo itself because our QA harness scripts are in there:

https://github.com/apache/spark/blob/master/dev/run-tests-jenkins

If not, could you make a JIRA for them and put it under Project Infra.

- Patrick


Re: -1s on pull requests?

2014-08-03 Thread Nicholas Chammas
On Sun, Aug 3, 2014 at 11:29 PM, Patrick Wendell pwend...@gmail.com wrote:

Nick - Any interest in doing these? this is all doable from within the
 spark repo itself because our QA harness scripts are in there:

 https://github.com/apache/spark/blob/master/dev/run-tests-jenkins

 If not, could you make a JIRA for them and put it under Project Infra.

I’ll make the JIRA and think about how to do this stuff. I’ll have to
understand what that run-tests-jenkins script does and see how easy it is
to extend.

Nick
​


Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-23 Thread Nicholas Chammas
By the way, it looks like there’s a JIRA plugin that integrates it with
GitHub:

   -
   
https://marketplace.atlassian.com/plugins/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin
   -
   
https://confluence.atlassian.com/display/BITBUCKET/Linking+Bitbucket+and+GitHub+accounts+to+JIRA

It does the automatic linking and shows some additional information
https://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png
that might be nice to have for heavy JIRA users.

Nick
​


On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com
wrote:

 Yeah it needs to have SPARK-XXX in the title (this is the format we
 request already). It just works with small synchronization script I
 wrote that we run every five minutes on Jeknins that uses the Github
 and Jenkins API:


 https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929

 - Patrick

 On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  That's pretty neat.
 
  How does it work? Do we just need to put the issue ID (e.g. SPARK-1234)
  anywhere in the pull request?
 
  Nick
 
 
  On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Just a small note, today I committed a tool that will automatically
  mirror pull requests to JIRA issues, so contributors will no longer
  have to manually post a pull request on the JIRA when they make one.
 
  It will create a link on the JIRA and also make a comment to trigger
  an e-mail to people watching.
 
  This should make some things easier, such as avoiding accidental
  duplicate effort on the same JIRA.
 
  - Patrick
 



Re: -1s on pull requests?

2014-07-21 Thread Shivaram Venkataraman
One way to do this would be to have a Github hook that parses -1s or +1s
and posts a commit status [1] (like say Travis [2]) right next to the PR.
Does anybody know of an existing tool that does this ?

Shivaram

[1] https://github.com/blog/1227-commit-status-api
[2]
http://blog.travis-ci.com/2012-09-04-pull-requests-just-got-even-more-awesome/


On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:

 Hi all,

 As the number of committers / contributors on Spark has increased, there
 are cases where pull requests get merged before all the review comments
 have been addressed. This happens say when one committer points out a
 problem with the pull request, and another committer doesn't see the
 earlier comment and merges the PR before the comment has been addressed.
  This is especially tricky for pull requests with a large number of
 comments, because it can be difficult to notice early comments describing
 blocking issues.

 This also happens when something accidentally gets merged after the tests
 have started but before tests have passed.

 Do folks have ideas on how we can handle this issue? Are there other
 projects that have good ways of handling this? It looks like for Hadoop,
 people can -1 / +1 on the JIRA.

 -Kay



Re: -1s on pull requests?

2014-07-21 Thread Patrick Wendell
I've always operated under the assumption that if a commiter makes a
comment on a PR, and that's not addressed, that should block the PR
from being merged (even without a specific -1). I don't know of any
cases where this has intentionally been violated, but I do think this
happens accidentally some times.

Unfortunately, we are not allowed to use those github hooks because of
the way the ASF github integration works.

I've lately been using a custom-made tool to help review pull
requests. One thing I could do is add a feature here saying which
committers have said LGTM on a PR (vs the ones that have commented).
We could also indicate the latest test status as Green/Yellow/Red
based on the Jenkins comments:

http://pwendell.github.io/spark-github-shim/

As a warning to potential users, my tool might crash your browser.

- Patrick

On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote:
 Hi all,

 As the number of committers / contributors on Spark has increased, there
 are cases where pull requests get merged before all the review comments
 have been addressed. This happens say when one committer points out a
 problem with the pull request, and another committer doesn't see the
 earlier comment and merges the PR before the comment has been addressed.
  This is especially tricky for pull requests with a large number of
 comments, because it can be difficult to notice early comments describing
 blocking issues.

 This also happens when something accidentally gets merged after the tests
 have started but before tests have passed.

 Do folks have ideas on how we can handle this issue? Are there other
 projects that have good ways of handling this? It looks like for Hadoop,
 people can -1 / +1 on the JIRA.

 -Kay


Re: -1s on pull requests?

2014-07-21 Thread Henry Saputra
There is ASF guidelines about Voting, including code review for
patches: http://www.apache.org/foundation/voting.html

Some ASF project do three +1 votes are required (to the issues like
JIRA or Github PR in this case) for a patch unless it is tagged with
lazy consensus [1] of like 48 hours.
For patches that are not critical, waiting for a while to let some
time for additional committers to review would be the best way to go.

Another thing is that all contributors need to be patience once their
patches have been submitted and pending reviewed. This is part of
being in open community.

Hope this helps.


- Henry

[1] http://www.apache.org/foundation/glossary.html#LazyConsensus

On Mon, Jul 21, 2014 at 1:59 PM, Patrick Wendell pwend...@gmail.com wrote:
 I've always operated under the assumption that if a commiter makes a
 comment on a PR, and that's not addressed, that should block the PR
 from being merged (even without a specific -1). I don't know of any
 cases where this has intentionally been violated, but I do think this
 happens accidentally some times.

 Unfortunately, we are not allowed to use those github hooks because of
 the way the ASF github integration works.

 I've lately been using a custom-made tool to help review pull
 requests. One thing I could do is add a feature here saying which
 committers have said LGTM on a PR (vs the ones that have commented).
 We could also indicate the latest test status as Green/Yellow/Red
 based on the Jenkins comments:

 http://pwendell.github.io/spark-github-shim/

 As a warning to potential users, my tool might crash your browser.

 - Patrick

 On Mon, Jul 21, 2014 at 1:44 PM, Kay Ousterhout k...@eecs.berkeley.edu 
 wrote:
 Hi all,

 As the number of committers / contributors on Spark has increased, there
 are cases where pull requests get merged before all the review comments
 have been addressed. This happens say when one committer points out a
 problem with the pull request, and another committer doesn't see the
 earlier comment and merges the PR before the comment has been addressed.
  This is especially tricky for pull requests with a large number of
 comments, because it can be difficult to notice early comments describing
 blocking issues.

 This also happens when something accidentally gets merged after the tests
 have started but before tests have passed.

 Do folks have ideas on how we can handle this issue? Are there other
 projects that have good ways of handling this? It looks like for Hadoop,
 people can -1 / +1 on the JIRA.

 -Kay


Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-20 Thread Nan Zhu
Awesome!

On Saturday, July 19, 2014, Patrick Wendell pwend...@gmail.com wrote:

 Just a small note, today I committed a tool that will automatically
 mirror pull requests to JIRA issues, so contributors will no longer
 have to manually post a pull request on the JIRA when they make one.

 It will create a link on the JIRA and also make a comment to trigger
 an e-mail to people watching.

 This should make some things easier, such as avoiding accidental
 duplicate effort on the same JIRA.

 - Patrick



Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-20 Thread Nicholas Chammas
That's pretty neat.

How does it work? Do we just need to put the issue ID (e.g. SPARK-1234)
anywhere in the pull request?

Nick


On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com
wrote:

 Just a small note, today I committed a tool that will automatically
 mirror pull requests to JIRA issues, so contributors will no longer
 have to manually post a pull request on the JIRA when they make one.

 It will create a link on the JIRA and also make a comment to trigger
 an e-mail to people watching.

 This should make some things easier, such as avoiding accidental
 duplicate effort on the same JIRA.

 - Patrick



Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-20 Thread Patrick Wendell
Yeah it needs to have SPARK-XXX in the title (this is the format we
request already). It just works with small synchronization script I
wrote that we run every five minutes on Jeknins that uses the Github
and Jenkins API:

https://github.com/apache/spark/commit/49e472744951d875627d78b0d6e93cd139232929

- Patrick

On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 That's pretty neat.

 How does it work? Do we just need to put the issue ID (e.g. SPARK-1234)
 anywhere in the pull request?

 Nick


 On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Just a small note, today I committed a tool that will automatically
 mirror pull requests to JIRA issues, so contributors will no longer
 have to manually post a pull request on the JIRA when they make one.

 It will create a link on the JIRA and also make a comment to trigger
 an e-mail to people watching.

 This should make some things easier, such as avoiding accidental
 duplicate effort on the same JIRA.

 - Patrick



Pull requests will be automatically linked to JIRA when submitted

2014-07-19 Thread Patrick Wendell
Just a small note, today I committed a tool that will automatically
mirror pull requests to JIRA issues, so contributors will no longer
have to manually post a pull request on the JIRA when they make one.

It will create a link on the JIRA and also make a comment to trigger
an e-mail to people watching.

This should make some things easier, such as avoiding accidental
duplicate effort on the same JIRA.

- Patrick