Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Thanks, I opened https://issues.apache.org/jira/browse/INFRA-18004

2019년 3월 14일 (목) 오전 8:35, Marcelo Vanzin 님이 작성:

> Go for it. I would do it now, instead of waiting, since there's been
> enough time for them to take action.
>
> On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon  wrote:
> >
> > Looks this bot keeps working. I am going to open a INFRA JIRA to block
> this bot in few days.
> > Please let me know if you guys have a different idea to prevent this.
> >
> > 2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성:
> >>
> >> Hi whom it may concern in Thincrs
> >>
> >>
> >>
> >> I am still observing this bot misuses Apache Spark’s JIRA board (see
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs)
> >>
> >> I contacted you guys once before but I haven’t got any response related
> with it. Still, this bot in this specific company looks misusing Apahce
> JIRA board.
> >> If it continues, I think we should block this bot. Could you guys stop
> misusing this bot please?
> >>
> >>
> >>
> >> From: Hyukjin Kwon 
> >> Date: Tuesday, January 8, 2019 at 11:18 AM
> >> To: "h...@thincrs.com" 
> >> Subject: Request to disable a bot account, 'Thincrs' in JIRA of Apache
> Spark
> >>
> >>
> >>
> >> Hi all,
> >>
> >>
> >>
> >>
> >>
> >> We, Apache Spark community, lately noticed one bot named ‘Thincrs’ in
> Apache Spark’s JIRA:
> https://issues.apache.org/jira/issues/?jql=text%20~%20Thincrs
> >>
> >>
> >>
> >> Looks like this is a bot and it keeps leaving some comments such as:
> >>
> >>
> >>
> >>   A user of thincrs has selected this issue. Deadline: Xxx, Xxx X, 
> XX:XX
> >>
> >>
> >>
> >>
> >>
> >> This makes some noise to Apache Spark maintainers, committers,
> contributors and users. It was asked (by me) to Spark’s dev mailing list
> before:
> >>
> >>
> >>
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/A-user-of-thincrs-has-selected-this-issue-Deadline-Xxx-Xxx-X--XX-XX-td25836.html
> >>
> >>
> >>
> >> And, one of PMCs in Apache Spark contacted to stop this bot if I am not
> mistaken.
> >>
> >>
> >>
> >>
> >>
> >> Lately, I noticed again this bot left a comment again as below:
> >>
> >>
> >>
> >>   Thincrs commented on SPARK-25823:
> >>
> >>   -
> >>
> >>
> >>
> >>   A user of thincrs has selected this issue. Deadline: Mon, Jan 14,
> 2019 10:32 PM
> >>
> >>
> >>
> >>
> >>
> >> This comment is not visible by one of Spark committer for now but
> leaving comments there send emails to all the people participating in the
> JIRA.
> >>
> >>
> >>
> >> Could you please stop this bot if it belongs to Thincrs please?
> >>
> >>
> >>
> >>
> >>
> >> Thanks.
>
>
>
> --
> Marcelo
>


Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Marcelo Vanzin
Go for it. I would do it now, instead of waiting, since there's been
enough time for them to take action.

On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon  wrote:
>
> Looks this bot keeps working. I am going to open a INFRA JIRA to block this 
> bot in few days.
> Please let me know if you guys have a different idea to prevent this.
>
> 2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성:
>>
>> Hi whom it may concern in Thincrs
>>
>>
>>
>> I am still observing this bot misuses Apache Spark’s JIRA board (see 
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs)
>>
>> I contacted you guys once before but I haven’t got any response related with 
>> it. Still, this bot in this specific company looks misusing Apahce JIRA 
>> board.
>> If it continues, I think we should block this bot. Could you guys stop 
>> misusing this bot please?
>>
>>
>>
>> From: Hyukjin Kwon 
>> Date: Tuesday, January 8, 2019 at 11:18 AM
>> To: "h...@thincrs.com" 
>> Subject: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark
>>
>>
>>
>> Hi all,
>>
>>
>>
>>
>>
>> We, Apache Spark community, lately noticed one bot named ‘Thincrs’ in Apache 
>> Spark’s JIRA:  https://issues.apache.org/jira/issues/?jql=text%20~%20Thincrs
>>
>>
>>
>> Looks like this is a bot and it keeps leaving some comments such as:
>>
>>
>>
>>   A user of thincrs has selected this issue. Deadline: Xxx, Xxx X,  XX:XX
>>
>>
>>
>>
>>
>> This makes some noise to Apache Spark maintainers, committers, contributors 
>> and users. It was asked (by me) to Spark’s dev mailing list before:
>>
>>
>>
>>   
>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-user-of-thincrs-has-selected-this-issue-Deadline-Xxx-Xxx-X--XX-XX-td25836.html
>>
>>
>>
>> And, one of PMCs in Apache Spark contacted to stop this bot if I am not 
>> mistaken.
>>
>>
>>
>>
>>
>> Lately, I noticed again this bot left a comment again as below:
>>
>>
>>
>>   Thincrs commented on SPARK-25823:
>>
>>   -
>>
>>
>>
>>   A user of thincrs has selected this issue. Deadline: Mon, Jan 14, 2019 
>> 10:32 PM
>>
>>
>>
>>
>>
>> This comment is not visible by one of Spark committer for now but leaving 
>> comments there send emails to all the people participating in the JIRA.
>>
>>
>>
>> Could you please stop this bot if it belongs to Thincrs please?
>>
>>
>>
>>
>>
>> Thanks.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Looks this bot keeps working. I am going to open a INFRA JIRA to block this
bot in few days.
Please let me know if you guys have a different idea to prevent this.

2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성:

> Hi whom it may concern in Thincrs
>
>
>
> I am still observing this bot misuses Apache Spark’s JIRA board (see
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs)
>
> I contacted you guys once before but I haven’t got any response related
> with it. Still, this bot in this specific company looks misusing Apahce
> JIRA board.
> If it continues, I think we should block this bot. Could you guys stop
> misusing this bot please?
>
>
>
> *From: *Hyukjin Kwon 
> *Date: *Tuesday, January 8, 2019 at 11:18 AM
> *To: *"h...@thincrs.com" 
> *Subject: *Request to disable a bot account, 'Thincrs' in JIRA of Apache
> Spark
>
>
>
> Hi all,
>
>
>
>
>
> We, Apache Spark community, lately noticed one bot named ‘Thincrs’ in
> Apache Spark’s JIRA:
> https://issues.apache.org/jira/issues/?jql=text%20~%20Thincrs
>
>
>
> Looks like this is a bot and it keeps leaving some comments such as:
>
>
>
>   A user of thincrs has selected this issue. Deadline: Xxx, Xxx X, 
> XX:XX
>
>
>
>
>
> This makes some noise to Apache Spark maintainers, committers,
> contributors and users. It was asked (by me) to Spark’s dev mailing list
> before:
>
>
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/A-user-of-thincrs-has-selected-this-issue-Deadline-Xxx-Xxx-X--XX-XX-td25836.html
>
>
>
> And, one of PMCs in Apache Spark contacted to stop this bot if I am not
> mistaken.
>
>
>
>
>
> Lately, I noticed again this bot left a comment again as below:
>
>
>
>   Thincrs commented on SPARK-25823:
>
>   -
>
>
>
>   A user of thincrs has selected this issue. Deadline: Mon, Jan 14, 2019
> 10:32 PM
>
>
>
>
>
> This comment is not visible by one of Spark committer for now but leaving
> comments there send emails to all the people participating in the JIRA.
>
>
>
> Could you please stop this bot if it belongs to Thincrs please?
>
>
>
>
>
> Thanks.
>


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
btw, let's wait and see if the non-k8s PRB tests pass before merging
https://github.com/apache/spark/pull/23993 in to 2.4.1

On Wed, Mar 13, 2019 at 3:42 PM shane knapp  wrote:

> 2.4.1 k8s integration test passed:
>
>
> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/
>
> thanks everyone!  :)
>
> On Wed, Mar 13, 2019 at 3:24 PM shane knapp  wrote:
>
>> 2.4.1 integration tests running:
>> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/
>>
>> On Wed, Mar 13, 2019 at 3:15 PM shane knapp  wrote:
>>
>>> upgrade completed, jenkins building again...  master PR merged, waiting
>>> for the 2.4.1 PR to launch the k8s integration tests.
>>>
>>> On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:
>>>
 okie dokie!  the time approacheth!

 i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the
 upgrade to take more than 15-20 mins, following which i will re-enable
 builds.

 On Wed, Mar 13, 2019 at 12:17 PM shane knapp 
 wrote:

> ok awesome.  let's shoot for 3pm PST.
>
> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin 
> wrote:
>
>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp 
>> wrote:
>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin <
>> van...@cloudera.com> wrote:
>> >>
>> >> Do the upgraded minikube/k8s versions break the current master
>> client
>> >> version too?
>> >>
>> > yes.
>>
>> Ah, so that part kinda sucks.
>>
>> Let's do this: since the master PR is good to go pending the minikube
>> upgrade, let's try to synchronize things. Set a time to do the
>> minikube upgrade this PM, if that works for you, and I'll merge that
>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>> make sure it passes the integration tests.
>>
>> --
>> Marcelo
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
2.4.1 k8s integration test passed:

https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/

thanks everyone!  :)

On Wed, Mar 13, 2019 at 3:24 PM shane knapp  wrote:

> 2.4.1 integration tests running:
> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/
>
> On Wed, Mar 13, 2019 at 3:15 PM shane knapp  wrote:
>
>> upgrade completed, jenkins building again...  master PR merged, waiting
>> for the 2.4.1 PR to launch the k8s integration tests.
>>
>> On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:
>>
>>> okie dokie!  the time approacheth!
>>>
>>> i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the
>>> upgrade to take more than 15-20 mins, following which i will re-enable
>>> builds.
>>>
>>> On Wed, Mar 13, 2019 at 12:17 PM shane knapp 
>>> wrote:
>>>
 ok awesome.  let's shoot for 3pm PST.

 On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin 
 wrote:

> On Wed, Mar 13, 2019 at 11:53 AM shane knapp 
> wrote:
> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
> wrote:
> >>
> >> Do the upgraded minikube/k8s versions break the current master
> client
> >> version too?
> >>
> > yes.
>
> Ah, so that part kinda sucks.
>
> Let's do this: since the master PR is good to go pending the minikube
> upgrade, let's try to synchronize things. Set a time to do the
> minikube upgrade this PM, if that works for you, and I'll merge that
> PR once it's done. Then I'll take care of backporting it to 2.4 and
> make sure it passes the integration tests.
>
> --
> Marcelo
>


 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
2.4.1 integration tests running:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8875/

On Wed, Mar 13, 2019 at 3:15 PM shane knapp  wrote:

> upgrade completed, jenkins building again...  master PR merged, waiting
> for the 2.4.1 PR to launch the k8s integration tests.
>
> On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:
>
>> okie dokie!  the time approacheth!
>>
>> i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the
>> upgrade to take more than 15-20 mins, following which i will re-enable
>> builds.
>>
>> On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:
>>
>>> ok awesome.  let's shoot for 3pm PST.
>>>
>>> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin 
>>> wrote:
>>>
 On Wed, Mar 13, 2019 at 11:53 AM shane knapp 
 wrote:
 > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
 wrote:
 >>
 >> Do the upgraded minikube/k8s versions break the current master client
 >> version too?
 >>
 > yes.

 Ah, so that part kinda sucks.

 Let's do this: since the master PR is good to go pending the minikube
 upgrade, let's try to synchronize things. Set a time to do the
 minikube upgrade this PM, if that works for you, and I'll merge that
 PR once it's done. Then I'll take care of backporting it to 2.4 and
 make sure it passes the integration tests.

 --
 Marcelo

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
upgrade completed, jenkins building again...  master PR merged, waiting for
the 2.4.1 PR to launch the k8s integration tests.

On Wed, Mar 13, 2019 at 2:55 PM shane knapp  wrote:

> okie dokie!  the time approacheth!
>
> i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the
> upgrade to take more than 15-20 mins, following which i will re-enable
> builds.
>
> On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:
>
>> ok awesome.  let's shoot for 3pm PST.
>>
>> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin 
>> wrote:
>>
>>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp 
>>> wrote:
>>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
>>> wrote:
>>> >>
>>> >> Do the upgraded minikube/k8s versions break the current master client
>>> >> version too?
>>> >>
>>> > yes.
>>>
>>> Ah, so that part kinda sucks.
>>>
>>> Let's do this: since the master PR is good to go pending the minikube
>>> upgrade, let's try to synchronize things. Set a time to do the
>>> minikube upgrade this PM, if that works for you, and I'll merge that
>>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>>> make sure it passes the integration tests.
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
okie dokie!  the time approacheth!

i'll pause jenkins @ 3pm to not accept new jobs.  i don't expect the
upgrade to take more than 15-20 mins, following which i will re-enable
builds.

On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:

> ok awesome.  let's shoot for 3pm PST.
>
> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin 
> wrote:
>
>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
>> wrote:
>> >>
>> >> Do the upgraded minikube/k8s versions break the current master client
>> >> version too?
>> >>
>> > yes.
>>
>> Ah, so that part kinda sucks.
>>
>> Let's do this: since the master PR is good to go pending the minikube
>> upgrade, let's try to synchronize things. Set a time to do the
>> minikube upgrade this PM, if that works for you, and I'll merge that
>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>> make sure it passes the integration tests.
>>
>> --
>> Marcelo
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Sounds good.

On Wed, Mar 13, 2019 at 12:17 PM shane knapp  wrote:
>
> ok awesome.  let's shoot for 3pm PST.
>
> On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin  wrote:
>>
>> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
>> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Do the upgraded minikube/k8s versions break the current master client
>> >> version too?
>> >>
>> > yes.
>>
>> Ah, so that part kinda sucks.
>>
>> Let's do this: since the master PR is good to go pending the minikube
>> upgrade, let's try to synchronize things. Set a time to do the
>> minikube upgrade this PM, if that works for you, and I'll merge that
>> PR once it's done. Then I'll take care of backporting it to 2.4 and
>> make sure it passes the integration tests.
>>
>> --
>> Marcelo
>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
ok awesome.  let's shoot for 3pm PST.

On Wed, Mar 13, 2019 at 11:59 AM Marcelo Vanzin  wrote:

> On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
> > On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin 
> wrote:
> >>
> >> Do the upgraded minikube/k8s versions break the current master client
> >> version too?
> >>
> > yes.
>
> Ah, so that part kinda sucks.
>
> Let's do this: since the master PR is good to go pending the minikube
> upgrade, let's try to synchronize things. Set a time to do the
> minikube upgrade this PM, if that works for you, and I'll merge that
> PR once it's done. Then I'll take care of backporting it to 2.4 and
> make sure it passes the integration tests.
>
> --
> Marcelo
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Sean Owen
I'm OK with this take. The problem with back-porting the client update
to 2.4.x at all is that it drops support for some old-but-not-that-old
K8S versions, which feels surprising in a maintenance release. That
said, maybe it's OK, and a little more OK for a 2.4.2 in several
months' time.

On Wed, Mar 13, 2019 at 1:49 PM Marcelo Vanzin  wrote:
>
> Do the upgraded minikube/k8s versions break the current master client
> version too?
>
> I'm not super concerned about 2.4 integration tests being broken for a
> little bit. It's very uncommon for new PRs to be open against
> branch-2.4 that would affect k8s.
>
> But I really don't want master to break. So if we can upgrade minikube
> first, even if that breaks k8s integration tests on branch-2.4 for a
> little bit, that would be optimal IMO.
>
> On Wed, Mar 13, 2019 at 11:26 AM shane knapp  wrote:
> >
> > hey everyone...  i wanted to break this discussion out of the mega-threads 
> > for the 2.4.1 RC candidates.
> >
> > the TL;DR is that we've been trying to update the k8s client libs to 
> > something much more modern.  however, for us to do this, we need to update 
> > our very old k8s and minikube versions.
> >
> > the problem here lies in the fact that if we update the client libs on 
> > master, but not the 2.4 branch, then the 2.4 branch k8s integration tests 
> > will fail if we update our backend minikube/k8s versions.
> >
> > i've done all of the testing locally for the new k8s client libs, and am 
> > ready to pull the trigger on the infrastructure upgrade (which will take 
> > all of ~15 mins).
> >
> > for this to happen, two PRs will need to be merged...  one for 2.4.1 and 
> > one for master.
> >
> > is there a chance that we can get 
> > https://github.com/apache/spark/pull/23993 merged in for the 2.4.1 release? 
> >  this will also require https://github.com/apache/spark/pull/24002 (for 
> > master) to be merged simultaneously.
> >
> > both of those PRs are ready to go (tho 23993 was closed w/o merge and i'm 
> > not entirely sure why).
> >
> > here's the primary jira we're using to track this upgrade:
> > https://issues.apache.org/jira/browse/SPARK-26742
> >
> > thanks in advance,
> >
> > shane
> > --
> > Shane Knapp
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu
>
>
>
> --
> Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
On Wed, Mar 13, 2019 at 11:53 AM shane knapp  wrote:
> On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  wrote:
>>
>> Do the upgraded minikube/k8s versions break the current master client
>> version too?
>>
> yes.

Ah, so that part kinda sucks.

Let's do this: since the master PR is good to go pending the minikube
upgrade, let's try to synchronize things. Set a time to do the
minikube upgrade this PM, if that works for you, and I'll merge that
PR once it's done. Then I'll take care of backporting it to 2.4 and
make sure it passes the integration tests.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
On Wed, Mar 13, 2019 at 11:49 AM Marcelo Vanzin  wrote:

> Do the upgraded minikube/k8s versions break the current master client
> version too?
>
> yes.


> I'm not super concerned about 2.4 integration tests being broken for a
> little bit. It's very uncommon for new PRs to be open against
> branch-2.4 that would affect k8s.
>
> ok.  if the 2.4.1 PR is merged at about the same time as the one for
master, we won't have to worry about the 2.4.x tests failing.

But I really don't want master to break. So if we can upgrade minikube
> first, even if that breaks k8s integration tests on branch-2.4 for a
> little bit, that would be optimal IMO.
>
> i have everything staged on the affected jenkins workers and can do the
infra upgrade really quickly...

shane


> On Wed, Mar 13, 2019 at 11:26 AM shane knapp  wrote:
> >
> > hey everyone...  i wanted to break this discussion out of the
> mega-threads for the 2.4.1 RC candidates.
> >
> > the TL;DR is that we've been trying to update the k8s client libs to
> something much more modern.  however, for us to do this, we need to update
> our very old k8s and minikube versions.
> >
> > the problem here lies in the fact that if we update the client libs on
> master, but not the 2.4 branch, then the 2.4 branch k8s integration tests
> will fail if we update our backend minikube/k8s versions.
> >
> > i've done all of the testing locally for the new k8s client libs, and am
> ready to pull the trigger on the infrastructure upgrade (which will take
> all of ~15 mins).
> >
> > for this to happen, two PRs will need to be merged...  one for 2.4.1 and
> one for master.
> >
> > is there a chance that we can get
> https://github.com/apache/spark/pull/23993 merged in for the 2.4.1
> release?  this will also require
> https://github.com/apache/spark/pull/24002 (for master) to be merged
> simultaneously.
> >
> > both of those PRs are ready to go (tho 23993 was closed w/o merge and
> i'm not entirely sure why).
> >
> > here's the primary jira we're using to track this upgrade:
> > https://issues.apache.org/jira/browse/SPARK-26742
> >
> > thanks in advance,
> >
> > shane
> > --
> > Shane Knapp
> > UC Berkeley EECS Research / RISELab Staff Technical Lead
> > https://rise.cs.berkeley.edu
>
>
>
> --
> Marcelo
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread Marcelo Vanzin
Do the upgraded minikube/k8s versions break the current master client
version too?

I'm not super concerned about 2.4 integration tests being broken for a
little bit. It's very uncommon for new PRs to be open against
branch-2.4 that would affect k8s.

But I really don't want master to break. So if we can upgrade minikube
first, even if that breaks k8s integration tests on branch-2.4 for a
little bit, that would be optimal IMO.

On Wed, Mar 13, 2019 at 11:26 AM shane knapp  wrote:
>
> hey everyone...  i wanted to break this discussion out of the mega-threads 
> for the 2.4.1 RC candidates.
>
> the TL;DR is that we've been trying to update the k8s client libs to 
> something much more modern.  however, for us to do this, we need to update 
> our very old k8s and minikube versions.
>
> the problem here lies in the fact that if we update the client libs on 
> master, but not the 2.4 branch, then the 2.4 branch k8s integration tests 
> will fail if we update our backend minikube/k8s versions.
>
> i've done all of the testing locally for the new k8s client libs, and am 
> ready to pull the trigger on the infrastructure upgrade (which will take all 
> of ~15 mins).
>
> for this to happen, two PRs will need to be merged...  one for 2.4.1 and one 
> for master.
>
> is there a chance that we can get https://github.com/apache/spark/pull/23993 
> merged in for the 2.4.1 release?  this will also require 
> https://github.com/apache/spark/pull/24002 (for master) to be merged 
> simultaneously.
>
> both of those PRs are ready to go (tho 23993 was closed w/o merge and i'm not 
> entirely sure why).
>
> here's the primary jira we're using to track this upgrade:
> https://issues.apache.org/jira/browse/SPARK-26742
>
> thanks in advance,
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-13 Thread shane knapp
hey everyone...  i wanted to break this discussion out of the mega-threads
for the 2.4.1 RC candidates.

the TL;DR is that we've been trying to update the k8s client libs to
something much more modern.  however, for us to do this, we need to update
our very old k8s and minikube versions.

the problem here lies in the fact that if we update the client libs on
master, but not the 2.4 branch, then the 2.4 branch k8s integration tests
will fail if we update our backend minikube/k8s versions.

i've done all of the testing locally for the new k8s client libs, and am
ready to pull the trigger on the infrastructure upgrade (which will take
all of ~15 mins).

for this to happen, two PRs will need to be merged...  one for 2.4.1 and
one for master.

is there a chance that we can get https://github.com/apache/spark/pull/23993
merged in for the 2.4.1 release?  this will also require
https://github.com/apache/spark/pull/24002 (for master) to be merged
simultaneously.

both of those PRs are ready to go (tho 23993 was closed w/o merge and i'm
not entirely sure why).

here's the primary jira we're using to track this upgrade:
https://issues.apache.org/jira/browse/SPARK-26742

thanks in advance,

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark job status on Kubernetes

2019-03-13 Thread Stavros Kontopoulos
AFAIK completed can happen in case of failures as well, check here:
https://github.com/kubernetes/kubernetes/blob/7f23a743e8c23ac6489340bbb34fa6f1d392db9d/pkg/client/conditions/conditions.go#L61

The phase of the pod should be `succeeded` to make a conclusion. This is
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/pkg/controller/sparkapplication/sparkapp_util.go#L75
how the spark operator uses that info to deduce the application status.

Stavros

On Wed, Mar 13, 2019 at 5:48 PM Chandu Kavar  wrote:

> Hi,
>
> We are running Spark jobs to Kubernetes (using Spark 2.4.0 and cluster
> mode). To get the status of the spark job we check the status of the driver
> pod (using Kubernetes REST API).
>
> Is it okay to assume that spark job is successful if the status of the
> driver pod is COMPLETED?
>
> Thanks,
> Chandu
>
>


Re: Partitions at DataSource API V2

2019-03-13 Thread Joseph Torres
The reader necessarily knows the number of partitions, since it's
responsible for generating its output partitions in the first place. I
won't speak for everyone, but it would make sense to me to pass in a
Partitioning instance to the writer, since it's already part of the v2
interface through the reader's SupportsReportPartitioning.

I don't think we can expose execution plans to the data source v2
interface; the exact Java structure of execution plans isn't stable across
even maintenance releases. Even if we could, I don't really see what the
use case would be - what information does the writer need that can't be
made available through either the input data or the input partitioning?
(The built-in Kafka sink, for example, handles metadata such as topic
switching by just accepting topic name as a column along with the data.)

On Wed, Mar 13, 2019 at 1:39 AM JOAQUIN GUANTER GONZALBEZ <
joaquin.guantergonzal...@telefonica.com> wrote:

> I'd like to bump this. I agree with Carlos that there is very little
> information at the DataSoruceWrite/DataSourceReader level. To me, ideally,
> the DataSourceWriter/Reader should have as much information as possible.
> Not only the number of partitions, but also ideally the whole execution
> plan.
>
> This would not only enable things like automatic creation of kafka topics
> with the correct number of partitions (like Carlos mentioned), but it would
> also allow advanced DataSources that, for example, analyze the execution
> plan to choose the correct parameters to implement differential privacy.
>
> CC'ing in Ryan, since he is leading the DataSourceV2 workgroup (sorry I
> can't joint the sync meetings, but I'm in CET time and the time logictics
> of that meeting don't work for Europe).
>
> Ryan, do you think it would be a good idea to provide extra information at
> the DataSourceWriter/Reader level to enable more advanced datasources?
> Would a PR contribution with these changed be a welcome addition?
>
> Thanks,
> Ximo
>
> -Mensaje original-
> De: CARLOS DEL PRADO MOTA 
> Enviado el: jueves, 7 de marzo de 2019 10:19
> Para: dev@spark.apache.org
> Asunto: Partitions at DataSource API V2
>
> Hello, I’m Carlos del Prado, developer at Telefonica.
>
> We are working with Spark's DataSource API V2 building a custom Kafka
> connector that creates the topic upon write. In order to do that, we need
> to know the number of partitions before writing data in each partition, at
> the DataSourceWriter level.
>
> Is there any way for us do that?
>
> King regards,
> Carlos.
>
> 
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>


Spark job status on Kubernetes

2019-03-13 Thread Chandu Kavar
Hi,

We are running Spark jobs to Kubernetes (using Spark 2.4.0 and cluster
mode). To get the status of the spark job we check the status of the driver
pod (using Kubernetes REST API).

Is it okay to assume that spark job is successful if the status of the
driver pod is COMPLETED?

Thanks,
Chandu


[DISCUSS] Introduce WorkerOffer reservation mechanism for barrier scheduling

2019-03-13 Thread wuyi
Currently, Barrier TaskSet has a hard requirement that tasks can only be
launched in a single resourceOffers round with enough slots(or say,
sufficient resources), but can not be guaranteed even if with enough slots
due to task locality delay scheduling(also see discussion 
https://github.com/apache/spark/pull/21758#discussion_r204917245
  ). So,
it is very likely that Barrier TaskSet gets a chunk of sufficient resources
after all the trouble, but let it go easily just because one of pending
tasks can not be scheduled. And yet, it is hard to control all tasks
launching at the same time, which will bring complexity for fault tolerance.
Futhermore, it causes severe resource competition between TaskSets and jobs
and introduce unclear semantic for DynamicAllocation(see discussion 
https://github.com/apache/spark/pull/21758#discussion_r204917880
  ).

So, here, I want to introduce a new mechanism, called /WorkerOffer
Reservation Mechanism for barrier scheduling. With /WorkerOffer Reservation
Mechanism, a barrier taskset could reserve WorkerOffer in multi
resourceOffers() round, and launch tasks at the same time once it accumulate
the sufficient resource. 

The whole process may looks like:

* [1] CoarseGrainedSchedulerBackend call
TaskScheduler#resourceOffers(offers)

* [2] in resourceOffers(), we firstly exclude reserved cpus by barrier
tasksets in previous resourceOffers() round

* [3] if a task(CPU_PER_TASK = 2) from barrier taskset could launch on
WorkerOffer(hostA, cores=5), lets say
WO1, then, we reserve 2 cpus from WO1 for this task. So, in next
resourceOffers() round, 2 cpus would
be exclude from WO1. In another word, in next resourceOffers()
round, WO1 would has 8 cpus to offer. 
And we'll regard this task as a ready task. 

* [4] After one or multiple resourceOffers() round, when the barrier
taskset's ready tasks' num reaches  
taskSets' numTasks, we could launch all of the ready tasks at the
same time. 

Besides, we have two features along with /WorkerOffer Reservation
Mechanism/:

* To avoid the deadlock which may be introuduced by serveral Barrier
TaskSets holding the reserved WorkerOffers for a long time, we'll ask
Barrier TaskSets to force releasing part of reserved WorkerOffers
on demand. So, it is highly possible that each Barrier TaskSet would be
launched in the end.

* Barrier TaskSet could replace old high level locality reserved WorkerOffer
with new low level locality WorkerOffer during the time it wating for
sufficient resources, to perform better locality at the end.

And there's possibility for /WorkerOffer Reservation Mechanism to work with
ExecuorAllocationManager(aka DynamicAllocation):

When cpus in WorkerOffer are reserved, we send a new event, called
ExecutorReservedEvent, to EAM, which indicates the corresponding Executor's
resource is being reserved. EAM receives that event should not regard the
executor is idle and remove it later, instead, it keeps the executor(maybe,
for a confined time) as it knows someone may use it later. Similarly, we
send a ExecutorReleasedEvent when reserved cpus are released.

/WorkerOffer Reservation Mechanism will not impact non-barrier taskset, it
remains the same behavior for non-barrier taskset.

To summary: 

* /WorkerOffer Reservation Mechanism relax the requirement for resources,
since barrier taskset could be launched after multiple resourceOffer()
round;

* barrier tasks are guaranteed to be launched at the same time;

* it provides an possibility to work with ExecuorAllocationManager;


Actually, I've already filed a JIRA SPARK-26439 and pr #24010 for this(but
less attention), any one interest on this could see it from the code
directly.

So, any one has any thoughts on this ? (personally, I think it would really
do good for barrier scheduling)



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org