Re: Review notification bot

2018-07-30 Thread Holden Karau
Another thing we could try and do (if folks would be down to try) is it
have not actually ping, but suggest the potential usernames to ping to the
user (e.g. say suggested reviewers you _may wish to ping_ and then list)?

On Mon, Jul 30, 2018 at 10:45 PM, Holden Karau  wrote:

>
> On Mon, Jul 30, 2018 at 10:22 PM, Reynold Xin  wrote:
>
>> I like the idea of this bot, but I'm somewhat annoyed by it. I have
>> touched a lot of files and wrote a lot of the original code. Everyday I
>> wake up I get a lot of emails from this bot.
>>
> We could blacklist the existing PMC (or add a rate limit)?
>
>>
>> Also if we are going to use this, can we rename the bot to something like
>> spark-bot, rather than holden's personal bot?
>>
> I originally did that, but GitHub told me I could only have one personal
> and one bot account. If someone else registered the spark-mention-bot I'd
> be happy to switch it to that.
>
>>
>> On Mon, Jul 30, 2018 at 10:18 PM Hyukjin Kwon 
>> wrote:
>>
>>> > That being said the folks being pinged are not just committers.
>>>
>>> I doubt it because only pinged ones I see are all committers and that's
>>> why I assumed the pinging is based on who committed the PR (which implies
>>> committer only).
>>> Do you maybe have some examples where non-committers were pinged? Looks
>>> at least, (almost?) all of them are committers and something needs to be
>>> fixed even if so.
>>>
>>> I recently argued about pinging things before - sounds it matters if it
>>> annoys. Since pinging is completely optional and cc'ing someone else might
>>> need other contexts not
>>> only assuming from the blame and who committed this, I am actually not
>>> super happy with that pinging for now. I was slightly supportive for this
>>> idea but now I actually slightly
>>> became negative on this after observing how it goes in practice.
>>>
>>> I wonder how other people think on this.
>>>
>>>
>>>
>>> 2018년 7월 31일 (화) 오후 12:33, Holden Karau 님이 작성:
>>>
 So CODEOWNERS is limited to committers by GitHub. We can definitely
 modify the config file though and I'm happy to write some custom logic if
 it helps support our needs. We can also just turn it off if it's too noisey
 for folks in general.

 That being said the folks being pinged are not just committers. The
 hope is to get more code authors who aren't committers involved in the
 reviews and then eventually become committers.

 On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon  wrote:

> *reviewers: I mean people who committed the PR given my observation.
>
> 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:
>
>> I was wondering if we can leave the configuration open and accept
>> some custom configurations, IMHO, because I saw some people less related 
>> or
>> less active are consistently pinged. Just started to get worried if they
>> get annoyed by this.
>> Also, some people could be interested in few specific areas. They
>> should get pinged too.
>> Also, assuming from people pinged, seems they are reviewers (which
>> basically means committers I guess). Was wondering if there's a big
>> difference between codeowners and bots.
>>
>>
>>
>> 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:
>>
>>> Th configuration file is optional, is there something you want to
>>> try and change?
>>>
>>> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon 
>>> wrote:
>>>
 I see. Thanks. I was wondering if I can see the configuration file
 since that looks needed (https://github.com/holdenk/me
 ntion-bot#configuration) but I couldn't find (sorry if it's just
 something I simply missed).

 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:

> So the one that is running is the the form in my own repo (set up
> for K8s deployment) - http://github.com/holdenk/mention-bot
>
> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
> wrote:
>
>> Holden, so, is it a fork in https://github.com/facebook
>> archive/mention-bot? Would you mind if I ask where I can see the
>> configurations for it?
>>
>>
>> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이
>> 작성:
>>
>>> Yeah so the issue with codeowners is it will only assign to
>>> committers on the repo (the Beam project found this out the 
>>> practical
>>> application way).
>>>
>>> I have a fork of mention bot running and it seems we can add it
>>> (need an infra ticket), but one of the things the Beam folks asked 
>>> was to
>>> not ping code authors who haven’t committed in the past year which 
>>> I need
>>> to do a bit of poking on to make happen.
>>>
>>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>

Re: Review notification bot

2018-07-30 Thread Holden Karau
On Mon, Jul 30, 2018 at 10:22 PM, Reynold Xin  wrote:

> I like the idea of this bot, but I'm somewhat annoyed by it. I have
> touched a lot of files and wrote a lot of the original code. Everyday I
> wake up I get a lot of emails from this bot.
>
We could blacklist the existing PMC (or add a rate limit)?

>
> Also if we are going to use this, can we rename the bot to something like
> spark-bot, rather than holden's personal bot?
>
I originally did that, but GitHub told me I could only have one personal
and one bot account. If someone else registered the spark-mention-bot I'd
be happy to switch it to that.

>
> On Mon, Jul 30, 2018 at 10:18 PM Hyukjin Kwon  wrote:
>
>> > That being said the folks being pinged are not just committers.
>>
>> I doubt it because only pinged ones I see are all committers and that's
>> why I assumed the pinging is based on who committed the PR (which implies
>> committer only).
>> Do you maybe have some examples where non-committers were pinged? Looks
>> at least, (almost?) all of them are committers and something needs to be
>> fixed even if so.
>>
>> I recently argued about pinging things before - sounds it matters if it
>> annoys. Since pinging is completely optional and cc'ing someone else might
>> need other contexts not
>> only assuming from the blame and who committed this, I am actually not
>> super happy with that pinging for now. I was slightly supportive for this
>> idea but now I actually slightly
>> became negative on this after observing how it goes in practice.
>>
>> I wonder how other people think on this.
>>
>>
>>
>> 2018년 7월 31일 (화) 오후 12:33, Holden Karau 님이 작성:
>>
>>> So CODEOWNERS is limited to committers by GitHub. We can definitely
>>> modify the config file though and I'm happy to write some custom logic if
>>> it helps support our needs. We can also just turn it off if it's too noisey
>>> for folks in general.
>>>
>>> That being said the folks being pinged are not just committers. The hope
>>> is to get more code authors who aren't committers involved in the reviews
>>> and then eventually become committers.
>>>
>>> On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon  wrote:
>>>
 *reviewers: I mean people who committed the PR given my observation.

 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:

> I was wondering if we can leave the configuration open and accept some
> custom configurations, IMHO, because I saw some people less related or 
> less
> active are consistently pinged. Just started to get worried if they get
> annoyed by this.
> Also, some people could be interested in few specific areas. They
> should get pinged too.
> Also, assuming from people pinged, seems they are reviewers (which
> basically means committers I guess). Was wondering if there's a big
> difference between codeowners and bots.
>
>
>
> 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:
>
>> Th configuration file is optional, is there something you want to try
>> and change?
>>
>> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon 
>> wrote:
>>
>>> I see. Thanks. I was wondering if I can see the configuration file
>>> since that looks needed (https://github.com/holdenk/
>>> mention-bot#configuration) but I couldn't find (sorry if it's just
>>> something I simply missed).
>>>
>>> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>>>
 So the one that is running is the the form in my own repo (set up
 for K8s deployment) - http://github.com/holdenk/mention-bot

 On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
 wrote:

> Holden, so, is it a fork in https://github.com/
> facebookarchive/mention-bot? Would you mind if I ask where I can
> see the configurations for it?
>
>
> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이
> 작성:
>
>> Yeah so the issue with codeowners is it will only assign to
>> committers on the repo (the Beam project found this out the practical
>> application way).
>>
>> I have a fork of mention bot running and it seems we can add it
>> (need an infra ticket), but one of the things the Beam folks asked 
>> was to
>> not ping code authors who haven’t committed in the past year which I 
>> need
>> to do a bit of poking on to make happen.
>>
>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> On this topic, I just stumbled on a GitHub feature called
>>> CODEOWNERS .
>>> It lets you specify owners of specific areas of the repository 
>>> using the
>>> same syntax that .gitignore uses. Here is CPython's CODEOWNERS
>>> file
>>> 

Re: Review notification bot

2018-07-30 Thread Reynold Xin
I like the idea of this bot, but I'm somewhat annoyed by it. I have touched
a lot of files and wrote a lot of the original code. Everyday I wake up I
get a lot of emails from this bot.

Also if we are going to use this, can we rename the bot to something like
spark-bot, rather than holden's personal bot?

On Mon, Jul 30, 2018 at 10:18 PM Hyukjin Kwon  wrote:

> > That being said the folks being pinged are not just committers.
>
> I doubt it because only pinged ones I see are all committers and that's
> why I assumed the pinging is based on who committed the PR (which implies
> committer only).
> Do you maybe have some examples where non-committers were pinged? Looks at
> least, (almost?) all of them are committers and something needs to be fixed 
> even
> if so.
>
> I recently argued about pinging things before - sounds it matters if it
> annoys. Since pinging is completely optional and cc'ing someone else might
> need other contexts not
> only assuming from the blame and who committed this, I am actually not
> super happy with that pinging for now. I was slightly supportive for this
> idea but now I actually slightly
> became negative on this after observing how it goes in practice.
>
> I wonder how other people think on this.
>
>
>
> 2018년 7월 31일 (화) 오후 12:33, Holden Karau 님이 작성:
>
>> So CODEOWNERS is limited to committers by GitHub. We can definitely
>> modify the config file though and I'm happy to write some custom logic if
>> it helps support our needs. We can also just turn it off if it's too noisey
>> for folks in general.
>>
>> That being said the folks being pinged are not just committers. The hope
>> is to get more code authors who aren't committers involved in the reviews
>> and then eventually become committers.
>>
>> On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon  wrote:
>>
>>> *reviewers: I mean people who committed the PR given my observation.
>>>
>>> 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:
>>>
 I was wondering if we can leave the configuration open and accept some
 custom configurations, IMHO, because I saw some people less related or less
 active are consistently pinged. Just started to get worried if they get
 annoyed by this.
 Also, some people could be interested in few specific areas. They
 should get pinged too.
 Also, assuming from people pinged, seems they are reviewers (which
 basically means committers I guess). Was wondering if there's a big
 difference between codeowners and bots.



 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:

> Th configuration file is optional, is there something you want to try
> and change?
>
> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon 
> wrote:
>
>> I see. Thanks. I was wondering if I can see the configuration file
>> since that looks needed (
>> https://github.com/holdenk/mention-bot#configuration) but I couldn't
>> find (sorry if it's just something I simply missed).
>>
>> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>>
>>> So the one that is running is the the form in my own repo (set up
>>> for K8s deployment) - http://github.com/holdenk/mention-bot
>>>
>>> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
>>> wrote:
>>>
 Holden, so, is it a fork in
 https://github.com/facebookarchive/mention-bot? Would you mind if
 I ask where I can see the configurations for it?


 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이
 작성:

> Yeah so the issue with codeowners is it will only assign to
> committers on the repo (the Beam project found this out the practical
> application way).
>
> I have a fork of mention bot running and it seems we can add it
> (need an infra ticket), but one of the things the Beam folks asked 
> was to
> not ping code authors who haven’t committed in the past year which I 
> need
> to do a bit of poking on to make happen.
>
> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> On this topic, I just stumbled on a GitHub feature called
>> CODEOWNERS .
>> It lets you specify owners of specific areas of the repository using 
>> the
>> same syntax that .gitignore uses. Here is CPython's CODEOWNERS
>> file
>> 
>> for reference.
>>
>> Dunno if that would complement mention-bot (which Facebook is
>> apparently no longer maintaining
>> ), or if
>> we can even use it given the ASF setup on GitHub. But I thought it 
>> would be
>> worth mentioning nonetheless.
>>
>> On Sat, Jul 14, 2018 at 11:17 AM 

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
> That being said the folks being pinged are not just committers.

I doubt it because only pinged ones I see are all committers and that's why
I assumed the pinging is based on who committed the PR (which implies
committer only).
Do you maybe have some examples where non-committers were pinged? Looks at
least, (almost?) all of them are committers and something needs to be
fixed even
if so.

I recently argued about pinging things before - sounds it matters if it
annoys. Since pinging is completely optional and cc'ing someone else might
need other contexts not
only assuming from the blame and who committed this, I am actually not
super happy with that pinging for now. I was slightly supportive for this
idea but now I actually slightly
became negative on this after observing how it goes in practice.

I wonder how other people think on this.



2018년 7월 31일 (화) 오후 12:33, Holden Karau 님이 작성:

> So CODEOWNERS is limited to committers by GitHub. We can definitely modify
> the config file though and I'm happy to write some custom logic if it helps
> support our needs. We can also just turn it off if it's too noisey for
> folks in general.
>
> That being said the folks being pinged are not just committers. The hope
> is to get more code authors who aren't committers involved in the reviews
> and then eventually become committers.
>
> On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon  wrote:
>
>> *reviewers: I mean people who committed the PR given my observation.
>>
>> 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:
>>
>>> I was wondering if we can leave the configuration open and accept some
>>> custom configurations, IMHO, because I saw some people less related or less
>>> active are consistently pinged. Just started to get worried if they get
>>> annoyed by this.
>>> Also, some people could be interested in few specific areas. They should
>>> get pinged too.
>>> Also, assuming from people pinged, seems they are reviewers (which
>>> basically means committers I guess). Was wondering if there's a big
>>> difference between codeowners and bots.
>>>
>>>
>>>
>>> 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:
>>>
 Th configuration file is optional, is there something you want to try
 and change?

 On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon 
 wrote:

> I see. Thanks. I was wondering if I can see the configuration file
> since that looks needed (
> https://github.com/holdenk/mention-bot#configuration) but I couldn't
> find (sorry if it's just something I simply missed).
>
> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>
>> So the one that is running is the the form in my own repo (set up for
>> K8s deployment) - http://github.com/holdenk/mention-bot
>>
>> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
>> wrote:
>>
>>> Holden, so, is it a fork in
>>> https://github.com/facebookarchive/mention-bot? Would you mind if I
>>> ask where I can see the configurations for it?
>>>
>>>
>>> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>>>
 Yeah so the issue with codeowners is it will only assign to
 committers on the repo (the Beam project found this out the practical
 application way).

 I have a fork of mention bot running and it seems we can add it
 (need an infra ticket), but one of the things the Beam folks asked was 
 to
 not ping code authors who haven’t committed in the past year which I 
 need
 to do a bit of poking on to make happen.

 On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> On this topic, I just stumbled on a GitHub feature called
> CODEOWNERS .
> It lets you specify owners of specific areas of the repository using 
> the
> same syntax that .gitignore uses. Here is CPython's CODEOWNERS
> file
> 
> for reference.
>
> Dunno if that would complement mention-bot (which Facebook is
> apparently no longer maintaining
> ), or if
> we can even use it given the ASF setup on GitHub. But I thought it 
> would be
> worth mentioning nonetheless.
>
> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>> Hearing no objections (and in a shout out to @ Nicholas Chammas
>> who initially suggested mention-bot back in 2016) I've set up a copy 
>> of
>> mention bot and run it against my own repo (looks like
>> https://github.com/holdenk/spark-testing-base/pull/253 ).
>>
>> If no one objects I’ll ask infra to turn this on for Spark on a
>> trial biases and we can revisit it 

Re: Review notification bot

2018-07-30 Thread Holden Karau
So CODEOWNERS is limited to committers by GitHub. We can definitely modify
the config file though and I'm happy to write some custom logic if it helps
support our needs. We can also just turn it off if it's too noisey for
folks in general.

That being said the folks being pinged are not just committers. The hope is
to get more code authors who aren't committers involved in the reviews and
then eventually become committers.

On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon  wrote:

> *reviewers: I mean people who committed the PR given my observation.
>
> 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:
>
>> I was wondering if we can leave the configuration open and accept some
>> custom configurations, IMHO, because I saw some people less related or less
>> active are consistently pinged. Just started to get worried if they get
>> annoyed by this.
>> Also, some people could be interested in few specific areas. They should
>> get pinged too.
>> Also, assuming from people pinged, seems they are reviewers (which
>> basically means committers I guess). Was wondering if there's a big
>> difference between codeowners and bots.
>>
>>
>>
>> 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:
>>
>>> Th configuration file is optional, is there something you want to try
>>> and change?
>>>
>>> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon 
>>> wrote:
>>>
 I see. Thanks. I was wondering if I can see the configuration file
 since that looks needed (
 https://github.com/holdenk/mention-bot#configuration) but I couldn't
 find (sorry if it's just something I simply missed).

 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:

> So the one that is running is the the form in my own repo (set up for
> K8s deployment) - http://github.com/holdenk/mention-bot
>
> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
> wrote:
>
>> Holden, so, is it a fork in
>> https://github.com/facebookarchive/mention-bot? Would you mind if I
>> ask where I can see the configurations for it?
>>
>>
>> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>>
>>> Yeah so the issue with codeowners is it will only assign to
>>> committers on the repo (the Beam project found this out the practical
>>> application way).
>>>
>>> I have a fork of mention bot running and it seems we can add it
>>> (need an infra ticket), but one of the things the Beam folks asked was 
>>> to
>>> not ping code authors who haven’t committed in the past year which I 
>>> need
>>> to do a bit of poking on to make happen.
>>>
>>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 On this topic, I just stumbled on a GitHub feature called
 CODEOWNERS .
 It lets you specify owners of specific areas of the repository using 
 the
 same syntax that .gitignore uses. Here is CPython's CODEOWNERS file
 
 for reference.

 Dunno if that would complement mention-bot (which Facebook is
 apparently no longer maintaining
 ), or if we
 can even use it given the ASF setup on GitHub. But I thought it would 
 be
 worth mentioning nonetheless.

 On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
 wrote:

> Hearing no objections (and in a shout out to @ Nicholas Chammas
> who initially suggested mention-bot back in 2016) I've set up a copy 
> of
> mention bot and run it against my own repo (looks like
> https://github.com/holdenk/spark-testing-base/pull/253 ).
>
> If no one objects I’ll ask infra to turn this on for Spark on a
> trial biases and we can revisit it based on how folks interact with 
> it.
>
> On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>> So there are a few bots along this line in OSS. If no one objects
>> I’ll take a look and find one which matches our use case and try it 
>> out.
>>
>> On Wed, Jun 6, 2018 at 10:33 AM Sean Owen 
>> wrote:
>>
>>> Certainly I will frequently dig through 'git blame' to figure
>>> out who might be the right reviewer. Maybe that's automatable -- 
>>> ping the
>>> person who last touched the most lines touched by the PR? There 
>>> might be
>>> some false positives there. And I suppose the downside is being 
>>> pinged
>>> forever for some change that just isn't well considered or one of 
>>> those
>>> accidental 100K-line PRs. So maybe some way to decline or silence is
>>> important, or maybe just ping once and leave it. 

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
*reviewers: I mean people who committed the PR given my observation.

2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성:

> I was wondering if we can leave the configuration open and accept some
> custom configurations, IMHO, because I saw some people less related or less
> active are consistently pinged. Just started to get worried if they get
> annoyed by this.
> Also, some people could be interested in few specific areas. They should
> get pinged too.
> Also, assuming from people pinged, seems they are reviewers (which
> basically means committers I guess). Was wondering if there's a big
> difference between codeowners and bots.
>
>
>
> 2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:
>
>> Th configuration file is optional, is there something you want to try and
>> change?
>>
>> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon  wrote:
>>
>>> I see. Thanks. I was wondering if I can see the configuration file since
>>> that looks needed (https://github.com/holdenk/mention-bot#configuration)
>>> but I couldn't find (sorry if it's just something I simply missed).
>>>
>>> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>>>
 So the one that is running is the the form in my own repo (set up for
 K8s deployment) - http://github.com/holdenk/mention-bot

 On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
 wrote:

> Holden, so, is it a fork in
> https://github.com/facebookarchive/mention-bot? Would you mind if I
> ask where I can see the configurations for it?
>
>
> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>
>> Yeah so the issue with codeowners is it will only assign to
>> committers on the repo (the Beam project found this out the practical
>> application way).
>>
>> I have a fork of mention bot running and it seems we can add it (need
>> an infra ticket), but one of the things the Beam folks asked was to not
>> ping code authors who haven’t committed in the past year which I need to 
>> do
>> a bit of poking on to make happen.
>>
>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> On this topic, I just stumbled on a GitHub feature called CODEOWNERS
>>> . It lets you
>>> specify owners of specific areas of the repository using the same syntax
>>> that .gitignore uses. Here is CPython's CODEOWNERS file
>>> 
>>> for reference.
>>>
>>> Dunno if that would complement mention-bot (which Facebook is
>>> apparently no longer maintaining
>>> ), or if we
>>> can even use it given the ASF setup on GitHub. But I thought it would be
>>> worth mentioning nonetheless.
>>>
>>> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
>>> wrote:
>>>
 Hearing no objections (and in a shout out to @ Nicholas Chammas who
 initially suggested mention-bot back in 2016) I've set up a copy of 
 mention
 bot and run it against my own repo (looks like
 https://github.com/holdenk/spark-testing-base/pull/253 ).

 If no one objects I’ll ask infra to turn this on for Spark on a
 trial biases and we can revisit it based on how folks interact with it.

 On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau >>> > wrote:

> So there are a few bots along this line in OSS. If no one objects
> I’ll take a look and find one which matches our use case and try it 
> out.
>
> On Wed, Jun 6, 2018 at 10:33 AM Sean Owen 
> wrote:
>
>> Certainly I will frequently dig through 'git blame' to figure out
>> who might be the right reviewer. Maybe that's automatable -- ping the
>> person who last touched the most lines touched by the PR? There 
>> might be
>> some false positives there. And I suppose the downside is being 
>> pinged
>> forever for some change that just isn't well considered or one of 
>> those
>> accidental 100K-line PRs. So maybe some way to decline or silence is
>> important, or maybe just ping once and leave it. Sure, a bot that 
>> just adds
>> a "Would @foo like to review?" comment on Github? Sure seems worth 
>> trying
>> if someone is willing to do the work to cook up the bot.
>>
>> On Wed, Jun 6, 2018 at 12:22 PM Holden Karau <
>> hol...@pigscanfly.ca> wrote:
>>
>>> Hi friends,
>>>
>>> Was chatting with some folks at the summit and I was wondering
>>> how people would feel about adding a review bot to ping folks. We 
>>> already
>>> have the review dashboard but I was thinking we could ping folks 
>>> who were
>>> the original authors 

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
I was wondering if we can leave the configuration open and accept some
custom configurations, IMHO, because I saw some people less related or less
active are consistently pinged. Just started to get worried if they get
annoyed by this.
Also, some people could be interested in few specific areas. They should
get pinged too.
Also, assuming from people pinged, seems they are reviewers (which
basically means committers I guess). Was wondering if there's a big
difference between codeowners and bots.



2018년 7월 31일 (화) 오전 11:38, Holden Karau 님이 작성:

> Th configuration file is optional, is there something you want to try and
> change?
>
> On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon  wrote:
>
>> I see. Thanks. I was wondering if I can see the configuration file since
>> that looks needed (https://github.com/holdenk/mention-bot#configuration)
>> but I couldn't find (sorry if it's just something I simply missed).
>>
>> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>>
>>> So the one that is running is the the form in my own repo (set up for
>>> K8s deployment) - http://github.com/holdenk/mention-bot
>>>
>>> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon 
>>> wrote:
>>>
 Holden, so, is it a fork in
 https://github.com/facebookarchive/mention-bot? Would you mind if I
 ask where I can see the configurations for it?


 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:

> Yeah so the issue with codeowners is it will only assign to committers
> on the repo (the Beam project found this out the practical application 
> way).
>
> I have a fork of mention bot running and it seems we can add it (need
> an infra ticket), but one of the things the Beam folks asked was to not
> ping code authors who haven’t committed in the past year which I need to 
> do
> a bit of poking on to make happen.
>
> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> On this topic, I just stumbled on a GitHub feature called CODEOWNERS
>> . It lets you
>> specify owners of specific areas of the repository using the same syntax
>> that .gitignore uses. Here is CPython's CODEOWNERS file
>> 
>> for reference.
>>
>> Dunno if that would complement mention-bot (which Facebook is
>> apparently no longer maintaining
>> ), or if we
>> can even use it given the ASF setup on GitHub. But I thought it would be
>> worth mentioning nonetheless.
>>
>> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
>> wrote:
>>
>>> Hearing no objections (and in a shout out to @ Nicholas Chammas who
>>> initially suggested mention-bot back in 2016) I've set up a copy of 
>>> mention
>>> bot and run it against my own repo (looks like
>>> https://github.com/holdenk/spark-testing-base/pull/253 ).
>>>
>>> If no one objects I’ll ask infra to turn this on for Spark on a
>>> trial biases and we can revisit it based on how folks interact with it.
>>>
>>> On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau 
>>> wrote:
>>>
 So there are a few bots along this line in OSS. If no one objects
 I’ll take a look and find one which matches our use case and try it 
 out.

 On Wed, Jun 6, 2018 at 10:33 AM Sean Owen  wrote:

> Certainly I will frequently dig through 'git blame' to figure out
> who might be the right reviewer. Maybe that's automatable -- ping the
> person who last touched the most lines touched by the PR? There might 
> be
> some false positives there. And I suppose the downside is being pinged
> forever for some change that just isn't well considered or one of 
> those
> accidental 100K-line PRs. So maybe some way to decline or silence is
> important, or maybe just ping once and leave it. Sure, a bot that 
> just adds
> a "Would @foo like to review?" comment on Github? Sure seems worth 
> trying
> if someone is willing to do the work to cook up the bot.
>
> On Wed, Jun 6, 2018 at 12:22 PM Holden Karau 
> wrote:
>
>> Hi friends,
>>
>> Was chatting with some folks at the summit and I was wondering
>> how people would feel about adding a review bot to ping folks. We 
>> already
>> have the review dashboard but I was thinking we could ping folks who 
>> were
>> the original authors of the code being changed whom might not be in 
>> the
>> habit of looking at the review dashboard.
>>
>> Cheers,
>>
>> Holden :)
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
> --
 

Re: Review notification bot

2018-07-30 Thread Holden Karau
Th configuration file is optional, is there something you want to try and
change?

On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon  wrote:

> I see. Thanks. I was wondering if I can see the configuration file since
> that looks needed (https://github.com/holdenk/mention-bot#configuration)
> but I couldn't find (sorry if it's just something I simply missed).
>
> 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:
>
>> So the one that is running is the the form in my own repo (set up for K8s
>> deployment) - http://github.com/holdenk/mention-bot
>>
>> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon  wrote:
>>
>>> Holden, so, is it a fork in
>>> https://github.com/facebookarchive/mention-bot? Would you mind if I ask
>>> where I can see the configurations for it?
>>>
>>>
>>> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>>>
 Yeah so the issue with codeowners is it will only assign to committers
 on the repo (the Beam project found this out the practical application 
 way).

 I have a fork of mention bot running and it seems we can add it (need
 an infra ticket), but one of the things the Beam folks asked was to not
 ping code authors who haven’t committed in the past year which I need to do
 a bit of poking on to make happen.

 On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> On this topic, I just stumbled on a GitHub feature called CODEOWNERS
> . It lets you
> specify owners of specific areas of the repository using the same syntax
> that .gitignore uses. Here is CPython's CODEOWNERS file
> 
> for reference.
>
> Dunno if that would complement mention-bot (which Facebook is
> apparently no longer maintaining
> ), or if we
> can even use it given the ASF setup on GitHub. But I thought it would be
> worth mentioning nonetheless.
>
> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
> wrote:
>
>> Hearing no objections (and in a shout out to @ Nicholas Chammas who
>> initially suggested mention-bot back in 2016) I've set up a copy of 
>> mention
>> bot and run it against my own repo (looks like
>> https://github.com/holdenk/spark-testing-base/pull/253 ).
>>
>> If no one objects I’ll ask infra to turn this on for Spark on a trial
>> biases and we can revisit it based on how folks interact with it.
>>
>> On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau 
>> wrote:
>>
>>> So there are a few bots along this line in OSS. If no one objects
>>> I’ll take a look and find one which matches our use case and try it out.
>>>
>>> On Wed, Jun 6, 2018 at 10:33 AM Sean Owen  wrote:
>>>
 Certainly I will frequently dig through 'git blame' to figure out
 who might be the right reviewer. Maybe that's automatable -- ping the
 person who last touched the most lines touched by the PR? There might 
 be
 some false positives there. And I suppose the downside is being pinged
 forever for some change that just isn't well considered or one of those
 accidental 100K-line PRs. So maybe some way to decline or silence is
 important, or maybe just ping once and leave it. Sure, a bot that just 
 adds
 a "Would @foo like to review?" comment on Github? Sure seems worth 
 trying
 if someone is willing to do the work to cook up the bot.

 On Wed, Jun 6, 2018 at 12:22 PM Holden Karau 
 wrote:

> Hi friends,
>
> Was chatting with some folks at the summit and I was wondering how
> people would feel about adding a review bot to ping folks. We already 
> have
> the review dashboard but I was thinking we could ping folks who were 
> the
> original authors of the code being changed whom might not be in the 
> habit
> of looking at the review dashboard.
>
> Cheers,
>
> Holden :)
> --
> Twitter: https://twitter.com/holdenkarau
>
 --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
> --
 Twitter: https://twitter.com/holdenkarau

>>> --
Twitter: https://twitter.com/holdenkarau


Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
I see. Thanks. I was wondering if I can see the configuration file since
that looks needed (https://github.com/holdenk/mention-bot#configuration)
but I couldn't find (sorry if it's just something I simply missed).

2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성:

> So the one that is running is the the form in my own repo (set up for K8s
> deployment) - http://github.com/holdenk/mention-bot
>
> On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon  wrote:
>
>> Holden, so, is it a fork in
>> https://github.com/facebookarchive/mention-bot? Would you mind if I ask
>> where I can see the configurations for it?
>>
>>
>> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>>
>>> Yeah so the issue with codeowners is it will only assign to committers
>>> on the repo (the Beam project found this out the practical application way).
>>>
>>> I have a fork of mention bot running and it seems we can add it (need an
>>> infra ticket), but one of the things the Beam folks asked was to not ping
>>> code authors who haven’t committed in the past year which I need to do a
>>> bit of poking on to make happen.
>>>
>>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 On this topic, I just stumbled on a GitHub feature called CODEOWNERS
 . It lets you
 specify owners of specific areas of the repository using the same syntax
 that .gitignore uses. Here is CPython's CODEOWNERS file
  for
 reference.

 Dunno if that would complement mention-bot (which Facebook is
 apparently no longer maintaining
 ), or if we can
 even use it given the ASF setup on GitHub. But I thought it would be worth
 mentioning nonetheless.

 On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
 wrote:

> Hearing no objections (and in a shout out to @ Nicholas Chammas who
> initially suggested mention-bot back in 2016) I've set up a copy of 
> mention
> bot and run it against my own repo (looks like
> https://github.com/holdenk/spark-testing-base/pull/253 ).
>
> If no one objects I’ll ask infra to turn this on for Spark on a trial
> biases and we can revisit it based on how folks interact with it.
>
> On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau 
> wrote:
>
>> So there are a few bots along this line in OSS. If no one objects
>> I’ll take a look and find one which matches our use case and try it out.
>>
>> On Wed, Jun 6, 2018 at 10:33 AM Sean Owen  wrote:
>>
>>> Certainly I will frequently dig through 'git blame' to figure out
>>> who might be the right reviewer. Maybe that's automatable -- ping the
>>> person who last touched the most lines touched by the PR? There might be
>>> some false positives there. And I suppose the downside is being pinged
>>> forever for some change that just isn't well considered or one of those
>>> accidental 100K-line PRs. So maybe some way to decline or silence is
>>> important, or maybe just ping once and leave it. Sure, a bot that just 
>>> adds
>>> a "Would @foo like to review?" comment on Github? Sure seems worth 
>>> trying
>>> if someone is willing to do the work to cook up the bot.
>>>
>>> On Wed, Jun 6, 2018 at 12:22 PM Holden Karau 
>>> wrote:
>>>
 Hi friends,

 Was chatting with some folks at the summit and I was wondering how
 people would feel about adding a review bot to ping folks. We already 
 have
 the review dashboard but I was thinking we could ping folks who were 
 the
 original authors of the code being changed whom might not be in the 
 habit
 of looking at the review dashboard.

 Cheers,

 Holden :)
 --
 Twitter: https://twitter.com/holdenkarau

>>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> --
> Twitter: https://twitter.com/holdenkarau
>
 --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>


Re: [Spark SQL] Future of CalendarInterval

2018-07-30 Thread Hyukjin Kwon
FYI, org.apache.spark.unsafe.types.CalendarInterval is undocumented in both
scaladoc/javadoc (entire unsafe module)
but org.apache.spark.sql.types.CalendarIntervalType is exposed (
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.CalendarIntervalType
)

+1 for starting the discussion after 2.4.0. I would suggest defer, as I
said in the PR again.

2018년 7월 29일 (일) 오후 6:58, Daniel Mateus Pires 님이 작성:

> Sounds good! @Xiao
>
> @Reynold AFAIK the only data type that is valid to cast to Calendar
> Interval is VARCHAR
>
> here is Postgres:
>
> postgres=# select CAST(CAST(interval '1 hour' AS varchar) AS interval);
>  interval
> --
>  01:00:00
> (1 row)
>
> (snippet comes from the JIRA)
>
> Thanks,
>
> Daniel
>
>
> On 27 July 2018 at 20:38, Xiao Li  wrote:
>
>> The code freeze of the upcoming release Spark 2.4 is very close. How
>> about revisiting this and explicitly defining the support scope
>> of CalendarIntervalType in the next release (Spark 3.0)?
>>
>> Thanks,
>>
>> Xiao
>>
>>
>> 2018-07-27 10:45 GMT-07:00 Reynold Xin :
>>
>>> CalendarInterval is definitely externally visible.
>>>
>>> E.g. sql("select interval 1 day").dtypes would return "Array[(String,
>>> String)] = Array((interval 1 days,CalendarIntervalType))"
>>>
>>> However, I'm not sure what it means to support casting. What are the
>>> semantics for casting from any other data type to calendar interval? I can
>>> see string casting and casting from itself, but not any other data types.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 27, 2018 at 10:34 AM Daniel Mateus Pires 
>>> wrote:
>>>
 Hi Sparkers! (maybe Sparkles ?)

 I just wanted to bring up the apparently ?controversial? Calendar
 Interval topic.

 I worked on: https://issues.apache.org/jira/browse/SPARK-24702,
 https://github.com/apache/spark/pull/21706

 The user was reporting an unexpected behaviour where he/she wasn’t able
 to cast to a Calendar Interval type.

 In the current version of Spark the following code works:

 scala> spark.sql("SELECT 'interval 1 hour' as 
 a").select(col("a").cast("calendarinterval")).show()++|
a|++|interval 1 hours|++


 While the following doesn’t:
 spark.sql("SELECT CALENDARINTERVAL('interval 1 hour') as a").show()


 Since the DataFrame API equivalent of the SQL worked, I thought adding
 it would be an easy decision to make (to make it consistent)

 However, I got push-back on the PR on the basis that “*we do not plan
 to expose Calendar Interval as a public type*”
 Should there be a consensus on either cleaning up the public DataFrame
 API out of CalendarIntervalType OR making it consistent with the SQL ?

 --
 Best regards,
 Daniel Mateus Pires
 Data Engineer @ Hudson's Bay Company

>>>
>>
>


Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Wenchen Fan
I went through the open JIRA tickets and here is a list that we should
consider for Spark 2.4:

*High Priority*:
SPARK-24374 : Support
Barrier Execution Mode in Apache Spark
This one is critical to the Spark ecosystem for deep learning. It only has
a few remaining works and I think we should have it in Spark 2.4.

*Middle Priority*:
SPARK-23899 : Built-in
SQL Function Improvement
We've already added a lot of built-in functions in this release, but there
are a few useful higher-order functions in progress, like `array_except`,
`transform`, etc. It would be great if we can get them in Spark 2.4.

SPARK-14220 : Build and
test Spark against Scala 2.12
Very close to finishing, great to have it in Spark 2.4.

SPARK-4502 : Spark SQL
reads unnecessary nested fields from Parquet
This one is there for years (thanks for your patience Michael!), and is
also close to finishing. Great to have it in 2.4.

SPARK-24882 : data
source v2 API improvement
This is to improve the data source v2 API based on what we learned during
this release. From the migration of existing sources and design of new
features, we found some problems in the API and want to address them. I
believe this should be the last significant API change to data source
v2, so great to have in Spark 2.4. I'll send a discuss email about it later.

SPARK-24252 : Add
catalog support in Data Source V2
This is a very important feature for data source v2, and is currently being
discussed in the dev list.

SPARK-24768 : Have a
built-in AVRO data source implementation
Most of it is done, but date/timestamp support is still missing. Great to
have in 2.4.

SPARK-23243 :
Shuffle+Repartition on an RDD could lead to incorrect answers
This is a long-standing correctness bug, great to have in 2.4.

There are some other important features like the adaptive execution,
streaming SQL, etc., not in the list, since I think we are not able to
finish them before 2.4.

Feel free to add more things if you think they are important to Spark 2.4
by replying to this email.

Thanks,
Wenchen

On Mon, Jul 30, 2018 at 11:00 PM Sean Owen  wrote:

> In theory releases happen on a time-based cadence, so it's pretty much
> wrap up what's ready by the code freeze and ship it. In practice, the
> cadence slips frequently, and it's very much a negotiation about what
> features should push the code freeze out a few weeks every time. So, kind
> of a hybrid approach here that works OK.
>
> Certainly speak up if you think there's something that really needs to get
> into 2.4. This is that discuss thread.
>
> (BTW I updated the page you mention just yesterday, to reflect the plan
> suggested in this thread.)
>
> On Mon, Jul 30, 2018 at 9:51 AM Tom Graves 
> wrote:
>
>> Shouldn't this be a discuss thread?
>>
>> I'm also happy to see more release managers and agree the time is getting
>> close, but we should see what features are in progress and see how close
>> things are and propose a date based on that.  Cutting a branch to soon just
>> creates more work for committers to push to more branches.
>>
>>  http://spark.apache.org/versioning-policy.html mentioned the code
>> freeze and release branch cut mid-august.
>>
>>
>> Tom
>>
>>


[build system] two workers will be reimaged w/ubuntu tomorrow

2018-07-30 Thread shane knapp
my testing is going really well, and i think we're --->this<--- close to
porting all of the spark builds to ubuntu!

TL;DR:   i am NOT planning on moving all builds to centos until after
august 8th.  i WOULD like to move the PRB to ubuntu before then.

anyways:

once these two smoke test builds pass, i'll be somewhere @ ~99% certainty
that we can just move all of the spark builds (except docs) en masse!

https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/59/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/887/

we're starting to run in to build node availability bottlenecks for some of
our lab builds, so we will be reinstalling the following two workers
w/ubuntu tomorrow:
amp-jenkins-worker-07
amp-jenkins-worker-08

this will give us some breathing room, and we shouldn't have any noticeable
impact on the spark builds.

the biggest worry is the pull request builder...  i'll update that job and
let a couple of PRBs run through before moving it back to the centos hosts.

if the regular PRB builds pass happily, this will let us move it from
centos to ubuntu *before* 2.4 is cut, and gives us two things:

1) unblock https://github.com/apache/spark/pull/21584
2) stop needing 2 builds for pull requests (one for regular tests on
centos, one to test against minikube on ubuntu).

questions/comments/concerns?

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Review notification bot

2018-07-30 Thread Holden Karau
So the one that is running is the the form in my own repo (set up for K8s
deployment) - http://github.com/holdenk/mention-bot

On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon  wrote:

> Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot?
> Would you mind if I ask where I can see the configurations for it?
>
>
> 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:
>
>> Yeah so the issue with codeowners is it will only assign to committers on
>> the repo (the Beam project found this out the practical application way).
>>
>> I have a fork of mention bot running and it seems we can add it (need an
>> infra ticket), but one of the things the Beam folks asked was to not ping
>> code authors who haven’t committed in the past year which I need to do a
>> bit of poking on to make happen.
>>
>> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> On this topic, I just stumbled on a GitHub feature called CODEOWNERS
>>> . It lets you
>>> specify owners of specific areas of the repository using the same syntax
>>> that .gitignore uses. Here is CPython's CODEOWNERS file
>>>  for
>>> reference.
>>>
>>> Dunno if that would complement mention-bot (which Facebook is apparently no
>>> longer maintaining
>>> ), or if we can
>>> even use it given the ASF setup on GitHub. But I thought it would be worth
>>> mentioning nonetheless.
>>>
>>> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
>>> wrote:
>>>
 Hearing no objections (and in a shout out to @ Nicholas Chammas who
 initially suggested mention-bot back in 2016) I've set up a copy of mention
 bot and run it against my own repo (looks like https://github.com/
 holdenk/spark-testing-base/pull/253 ).

 If no one objects I’ll ask infra to turn this on for Spark on a trial
 biases and we can revisit it based on how folks interact with it.

 On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau 
 wrote:

> So there are a few bots along this line in OSS. If no one objects I’ll
> take a look and find one which matches our use case and try it out.
>
> On Wed, Jun 6, 2018 at 10:33 AM Sean Owen  wrote:
>
>> Certainly I will frequently dig through 'git blame' to figure out who
>> might be the right reviewer. Maybe that's automatable -- ping the person
>> who last touched the most lines touched by the PR? There might be some
>> false positives there. And I suppose the downside is being pinged forever
>> for some change that just isn't well considered or one of those 
>> accidental
>> 100K-line PRs. So maybe some way to decline or silence is important, or
>> maybe just ping once and leave it. Sure, a bot that just adds a "Would 
>> @foo
>> like to review?" comment on Github? Sure seems worth trying if someone is
>> willing to do the work to cook up the bot.
>>
>> On Wed, Jun 6, 2018 at 12:22 PM Holden Karau 
>> wrote:
>>
>>> Hi friends,
>>>
>>> Was chatting with some folks at the summit and I was wondering how
>>> people would feel about adding a review bot to ping folks. We already 
>>> have
>>> the review dashboard but I was thinking we could ping folks who were the
>>> original authors of the code being changed whom might not be in the 
>>> habit
>>> of looking at the review dashboard.
>>>
>>> Cheers,
>>>
>>> Holden :)
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>



 --
 Twitter: https://twitter.com/holdenkarau
 --
 Twitter: https://twitter.com/holdenkarau

>>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>


Re: Why percentile and distinct are not done in one job?

2018-07-30 Thread Reynold Xin
Which API are you talking about?

On Mon, Jul 30, 2018 at 7:03 AM 吴晓菊  wrote:

> I noticed that in column analyzing, 2 jobs will run separately to
> calculate percentiles and then distinct. Why not combine into one job since
> HyperLogLog also supports merge?
>
> Chrysan Wu
> Phone:+86 17717640807
>
>


Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Sean Owen
In theory releases happen on a time-based cadence, so it's pretty much wrap
up what's ready by the code freeze and ship it. In practice, the cadence
slips frequently, and it's very much a negotiation about what features
should push the code freeze out a few weeks every time. So, kind of a
hybrid approach here that works OK.

Certainly speak up if you think there's something that really needs to get
into 2.4. This is that discuss thread.

(BTW I updated the page you mention just yesterday, to reflect the plan
suggested in this thread.)

On Mon, Jul 30, 2018 at 9:51 AM Tom Graves 
wrote:

> Shouldn't this be a discuss thread?
>
> I'm also happy to see more release managers and agree the time is getting
> close, but we should see what features are in progress and see how close
> things are and propose a date based on that.  Cutting a branch to soon just
> creates more work for committers to push to more branches.
>
>  http://spark.apache.org/versioning-policy.html mentioned the code freeze
> and release branch cut mid-august.
>
>
> Tom
>
>


Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Tom Graves
 Shouldn't this be a discuss thread?  
I'm also happy to see more release managers and agree the time is getting 
close, but we should see what features are in progress and see how close things 
are and propose a date based on that.  Cutting a branch to soon just creates 
more work for committers to push to more branches. 
 http://spark.apache.org/versioning-policy.html mentioned the code freeze and 
release branch cut mid-august.

Tom
On Friday, July 6, 2018, 11:47:35 AM CDT, Reynold Xin  
wrote:  
 
 FYI 6 mo is coming up soon since the last release. We will cut the branch and 
code freeze on Aug 1st in order to get 2.4 out on time.
  

Why percentile and distinct are not done in one job?

2018-07-30 Thread 吴晓菊
I noticed that in column analyzing, 2 jobs will run separately to calculate
percentiles and then distinct. Why not combine into one job since
HyperLogLog also supports merge?

Chrysan Wu
Phone:+86 17717640807


Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-30 Thread Wenchen Fan
Another two correctness bug fixes were merged to 2.3 today:
https://issues.apache.org/jira/browse/SPARK-24934
https://issues.apache.org/jira/browse/SPARK-24957

On Mon, Jul 30, 2018 at 1:19 PM Xiao Li  wrote:

> Sounds good to me. Thanks! Today, we merged another correctness fix
> https://github.com/apache/spark/pull/21772.
>
> Xiao
>
> 2018-07-29 18:31 GMT-07:00 Saisai Shao :
>
>> Sure, I will do a next RC. I'm still waiting for a CVE fix, if this can
>> be done in this two days, I will also include that one.
>>
>> Xiao Li  于2018年7月28日周六 上午12:05写道:
>>
>>> The following blocker/important fixes have been merged to Spark 2.3
>>> branch:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-24927
>>> https://issues.apache.org/jira/browse/SPARK-24867
>>> https://issues.apache.org/jira/browse/SPARK-24891
>>>
>>> *Saisai*, could you start the next RC?
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>>
>>> 2018-07-20 14:21 GMT-07:00 Tom Graves :
>>>
 fyi, I merged in a couple jira that were critical (and I thought would
 be good to include in the next release) that if we spin another RC will get
 included, we should update the jira SPARK-24755
 
  and SPARK-24677
 ,
 if anyone disagrees we could back those out but I think they would be good
 to include.

 Tom

 On Thursday, July 19, 2018, 8:13:23 PM CDT, Saisai Shao <
 sai.sai.s...@gmail.com> wrote:


 Sure, I can wait for this and create another RC then.

 Thanks,
 Saisai

 Xiao Li  于2018年7月20日周五 上午9:11写道:

 Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
 created. The PR has been created. Since this is not rare, let us merge it
 to 2.3.2?

 Reynold' PR is to get rid of AnalysisBarrier. That is better than
 multiple patches we added for AnalysisBarrier after 2.3.0 release. We can
 target it to 2.4.

 Thanks,

 Xiao

 2018-07-19 17:48 GMT-07:00 Saisai Shao :

 I see, thanks Reynold.

 Reynold Xin  于2018年7月20日周五 上午8:46写道:

 Looking at the list of pull requests it looks like this is the ticket:
 https://issues.apache.org/jira/browse/SPARK-24867



 On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin 
 wrote:

 I don't think my ticket should block this release. It's a big general
 refactoring.

 Xiao do you have a ticket for the bug you found?


 On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao 
 wrote:

 Hi Xiao,

 Are you referring to this JIRA (
 https://issues.apache.org/jira/browse/SPARK-24865)?

 Xiao Li  于2018年7月20日周五 上午2:41写道:

 dfWithUDF.cache()
 dfWithUDF.write.saveAsTable("t")
 dfWithUDF.write.saveAsTable("t1")


 Cached data is not being used. It causes a big performance regression.




 2018-07-19 11:32 GMT-07:00 Sean Owen :

 What regression are you referring to here? A -1 vote really needs a
 rationale.

 On Thu, Jul 19, 2018 at 1:27 PM Xiao Li  wrote:

 I would first vote -1.

 I might find another regression caused by the analysis barrier. Will
 keep you posted.




>>>
>


Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot?
Would you mind if I ask where I can see the configurations for it?


2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성:

> Yeah so the issue with codeowners is it will only assign to committers on
> the repo (the Beam project found this out the practical application way).
>
> I have a fork of mention bot running and it seems we can add it (need an
> infra ticket), but one of the things the Beam folks asked was to not ping
> code authors who haven’t committed in the past year which I need to do a
> bit of poking on to make happen.
>
> On Sun, Jul 22, 2018 at 7:04 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> On this topic, I just stumbled on a GitHub feature called CODEOWNERS
>> . It lets you
>> specify owners of specific areas of the repository using the same syntax
>> that .gitignore uses. Here is CPython's CODEOWNERS file
>>  for
>> reference.
>>
>> Dunno if that would complement mention-bot (which Facebook is apparently no
>> longer maintaining
>> ), or if we can
>> even use it given the ASF setup on GitHub. But I thought it would be worth
>> mentioning nonetheless.
>>
>> On Sat, Jul 14, 2018 at 11:17 AM Holden Karau 
>> wrote:
>>
>>> Hearing no objections (and in a shout out to @ Nicholas Chammas who
>>> initially suggested mention-bot back in 2016) I've set up a copy of mention
>>> bot and run it against my own repo (looks like
>>> https://github.com/holdenk/spark-testing-base/pull/253 ).
>>>
>>> If no one objects I’ll ask infra to turn this on for Spark on a trial
>>> biases and we can revisit it based on how folks interact with it.
>>>
>>> On Wed, Jun 6, 2018 at 12:24 PM, Holden Karau 
>>> wrote:
>>>
 So there are a few bots along this line in OSS. If no one objects I’ll
 take a look and find one which matches our use case and try it out.

 On Wed, Jun 6, 2018 at 10:33 AM Sean Owen  wrote:

> Certainly I will frequently dig through 'git blame' to figure out who
> might be the right reviewer. Maybe that's automatable -- ping the person
> who last touched the most lines touched by the PR? There might be some
> false positives there. And I suppose the downside is being pinged forever
> for some change that just isn't well considered or one of those accidental
> 100K-line PRs. So maybe some way to decline or silence is important, or
> maybe just ping once and leave it. Sure, a bot that just adds a "Would 
> @foo
> like to review?" comment on Github? Sure seems worth trying if someone is
> willing to do the work to cook up the bot.
>
> On Wed, Jun 6, 2018 at 12:22 PM Holden Karau 
> wrote:
>
>> Hi friends,
>>
>> Was chatting with some folks at the summit and I was wondering how
>> people would feel about adding a review bot to ping folks. We already 
>> have
>> the review dashboard but I was thinking we could ping folks who were the
>> original authors of the code being changed whom might not be in the habit
>> of looking at the review dashboard.
>>
>> Cheers,
>>
>> Holden :)
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
> --
 Twitter: https://twitter.com/holdenkarau

>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>