Re: Integrating ML/DL frameworks with Spark

2018-05-23 Thread Xiangrui Meng
Hi all,

Thanks for your feedback! I uploaded a SPIP doc for the barrier scheduling
feature at https://issues.apache.org/jira/browse/SPARK-24374. Please take a
look and leave your comments there. I had some offline discussion with +Xingbo
Jiang  to help me design the APIs. He is quite
familiar with Spark job scheduler and he will share some design ideas on
the JIRA.

I will work on SPIPs for the other two proposals: 1) fast data exchange, 2)
accelerator-aware scheduling. I definitely need some help for the second
one because I'm not familiar with YARN/Mesos/k8s.

Best,
Xiangrui

On Sun, May 20, 2018 at 8:19 PM Felix Cheung 
wrote:

> Very cool. We would be very interested in this.
>
> What is the plan forward to make progress in each of the three areas?
>
>
> --
> *From:* Bryan Cutler 
> *Sent:* Monday, May 14, 2018 11:37:20 PM
> *To:* Xiangrui Meng
> *Cc:* Reynold Xin; dev
>
> *Subject:* Re: Integrating ML/DL frameworks with Spark
> Thanks for starting this discussion, I'd also like to see some
> improvements in this area and glad to hear that the Pandas UDFs / Arrow
> functionality might be useful.  I'm wondering if from your initial
> investigations you found anything lacking from the Arrow format or possible
> improvements that would simplify the data representation?  Also, while data
> could be handed off in a UDF, would it make sense to also discuss a more
> formal way to externalize the data in a way that would also work for the
> Scala API?
>
> Thanks,
> Bryan
>
> On Wed, May 9, 2018 at 4:31 PM, Xiangrui Meng  wrote:
>
>> Shivaram: Yes, we can call it "gang scheduling" or "barrier
>> synchronization". Spark doesn't support it now. The proposal is to have a
>> proper support in Spark's job scheduler, so we can integrate well with
>> MPI-like frameworks.
>>
>>
>> On Tue, May 8, 2018 at 11:17 AM Nan Zhu  wrote:
>>
>>> .how I skipped the last part
>>>
>>> On Tue, May 8, 2018 at 11:16 AM, Reynold Xin 
>>> wrote:
>>>
 Yes, Nan, totally agree. To be on the same page, that's exactly what I
 wrote wasn't it?

 On Tue, May 8, 2018 at 11:14 AM Nan Zhu  wrote:

> besides that, one of the things which is needed by multiple frameworks
> is to schedule tasks in a single wave
>
> i.e.
>
> if some frameworks like xgboost/mxnet requires 50 parallel workers,
> Spark is desired to provide a capability to ensure that either we run 50
> tasks at once, or we should quit the complete application/job after some
> timeout period
>
> Best,
>
> Nan
>
> On Tue, May 8, 2018 at 11:10 AM, Reynold Xin 
> wrote:
>
>> I think that's what Xiangrui was referring to. Instead of retrying a
>> single task, retry the entire stage, and the entire stage of tasks need 
>> to
>> be scheduled all at once.
>>
>>
>> On Tue, May 8, 2018 at 8:53 AM Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>>>

>- Fault tolerance and execution model: Spark assumes
>fine-grained task recovery, i.e. if something fails, only that 
> task is
>rerun. This doesn’t match the execution model of distributed ML/DL
>frameworks that are typically MPI-based, and rerunning a single 
> task would
>lead to the entire system hanging. A whole stage needs to be 
> re-run.
>
> This is not only useful for integrating with 3rd-party frameworks,
 but also useful for scaling MLlib algorithms. One of my earliest 
 attempts
 in Spark MLlib was to implement All-Reduce primitive (SPARK-1485
 ). But we ended
 up with some compromised solutions. With the new execution model, we 
 can
 set up a hybrid cluster and do all-reduce properly.


>>> Is there a particular new execution model you are referring to or do
>>> we plan to investigate a new execution model ?  For the MPI-like model, 
>>> we
>>> also need gang scheduling (i.e. schedule all tasks at once or none of 
>>> them)
>>> and I dont think we have support for that in the scheduler right now.
>>>

> --

 Xiangrui Meng

 Software Engineer

 Databricks Inc. [image: http://databricks.com]
 

>>>
>>>
>
>>> --
>>
>> Xiangrui Meng
>>
>> Software Engineer
>>
>> Databricks Inc. [image: http://databricks.com] 
>>
>
> --

Xiangrui Meng

Software Engineer

Databricks Inc. [image: http://databricks.com] 


Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Marcelo Vanzin
Sure. Also, I'd appreciate if these bugs were properly triaged and
targeted, so that we could avoid creating RCs when we know there are
blocking bugs that will prevent the RC vote from succeeding.

On Wed, May 23, 2018 at 9:02 AM, Xiao Li  wrote:
> -1
>
> Yeah, we should fix it in Spark 2.3.1.
> https://issues.apache.org/jira/browse/SPARK-24257 is a correctness bug. The
> PR can be merged soon. Thus, let us have another RC?
>
> Thanks,
>
> Xiao
>
>
> 2018-05-23 8:04 GMT-07:00 chenliang613 :
>>
>> Hi
>>
>> Agree with Wenchen, it is better to fix this issue.
>>
>> Regards
>> Liang
>>
>>
>> cloud0fan wrote
>> > We found a critical bug in tungsten that can lead to silent data
>> > corruption: https://github.com/apache/spark/pull/21311
>> >
>> > This is a long-standing bug that starts with Spark 2.0(not a
>> > regression),
>> > but since we are going to release 2.3.1, I think it's a good chance to
>> > include this fix.
>> >
>> > We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
>> > discuss if we should do a new release for 2.0, 2.1, 2.2 later.
>> >
>> > Thanks,
>> > Wenchen
>> >
>> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen 
>>
>> > srowen@
>>
>> >  wrote:
>> >
>> >> +1 Same result for me as with RC1.
>> >>
>> >>
>> >> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 
>>
>> > vanzin@
>>
>> > 
>> >> wrote:
>> >>
>> >>> Please vote on releasing the following candidate as Apache Spark
>> >>> version
>> >>> 2.3.1.
>> >>>
>> >>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
>> >>> at least 3 +1 PMC votes are cast.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.1
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see http://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
>> >>> https://github.com/apache/spark/tree/v2.3.1-rc2
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> >>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>>
>> >>> https://repository.apache.org/content/repositories/orgapachespark-1270/
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>> >>>
>> >>> The list of bug fixes going into 2.3.1 can be found at the following
>> >>> URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>> >>>
>> >>> ===
>> >>> What should happen to JIRA tickets still targeting 2.3.1?
>> >>> ===
>> >>>
>> >>> The current list of open tickets targeted at 2.3.1 can be found at:
>> >>> https://s.apache.org/Q3Uo
>> >>>
>> >>> Committers should look at those and triage. Extremely important bug
>> >>> fixes, documentation, and API tweaks that impact compatibility should
>> >>> be worked on immediately. Everything else please retarget to an
>> >>> appropriate release.
>> >>>
>> >>> ==
>> >>> But my bug isn't fixed?
>> >>> ==
>> >>>
>> >>> In order to make timely releases, we will typically not hold the
>> >>> release unless the bug in question is a regression from the previous
>> >>> release. That being said, if there is something which is a regression
>> >>> that has not been correctly targeted please ping me or a committer to
>> >>> help target the issue.
>> >>>
>> >>>
>> >>> --
>> >>> Marcelo
>> >>>
>> >>> -
>> >>> To unsubscribe e-mail:
>>
>> > dev-unsubscribe@.apache
>>
>> >>>
>> >>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Xiao Li
-1

Yeah, we should fix it in Spark 2.3.1.
https://issues.apache.org/jira/browse/SPARK-24257 is a correctness bug. The
PR can be merged soon. Thus, let us have another RC?

Thanks,

Xiao


2018-05-23 8:04 GMT-07:00 chenliang613 :

> Hi
>
> Agree with Wenchen, it is better to fix this issue.
>
> Regards
> Liang
>
>
> cloud0fan wrote
> > We found a critical bug in tungsten that can lead to silent data
> > corruption: https://github.com/apache/spark/pull/21311
> >
> > This is a long-standing bug that starts with Spark 2.0(not a regression),
> > but since we are going to release 2.3.1, I think it's a good chance to
> > include this fix.
> >
> > We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
> > discuss if we should do a new release for 2.0, 2.1, 2.2 later.
> >
> > Thanks,
> > Wenchen
> >
> > On Wed, May 23, 2018 at 9:54 PM, Sean Owen 
>
> > srowen@
>
> >  wrote:
> >
> >> +1 Same result for me as with RC1.
> >>
> >>
> >> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 
>
> > vanzin@
>
> > 
> >> wrote:
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> version
> >>> 2.3.1.
> >>>
> >>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
> >>> at least 3 +1 PMC votes are cast.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.3.1
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see http://spark.apache.org/
> >>>
> >>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
> >>> https://github.com/apache/spark/tree/v2.3.1-rc2
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>> https://repository.apache.org/content/repositories/
> orgapachespark-1270/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
> >>>
> >>> The list of bug fixes going into 2.3.1 can be found at the following
> >>> URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by taking
> >>> an existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> >>> the current RC and see if anything important breaks, in the Java/Scala
> >>> you can add the staging repository to your projects resolvers and test
> >>> with the RC (make sure to clean up the artifact cache before/after so
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 2.3.1?
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 2.3.1 can be found at:
> >>> https://s.apache.org/Q3Uo
> >>>
> >>> Committers should look at those and triage. Extremely important bug
> >>> fixes, documentation, and API tweaks that impact compatibility should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the previous
> >>> release. That being said, if there is something which is a regression
> >>> that has not been correctly targeted please ping me or a committer to
> >>> help target the issue.
> >>>
> >>>
> >>> --
> >>> Marcelo
> >>>
> >>> -
> >>> To unsubscribe e-mail:
>
> > dev-unsubscribe@.apache
>
> >>>
> >>>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread chenliang613
Hi

Agree with Wenchen, it is better to fix this issue.

Regards
Liang


cloud0fan wrote
> We found a critical bug in tungsten that can lead to silent data
> corruption: https://github.com/apache/spark/pull/21311
> 
> This is a long-standing bug that starts with Spark 2.0(not a regression),
> but since we are going to release 2.3.1, I think it's a good chance to
> include this fix.
> 
> We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
> discuss if we should do a new release for 2.0, 2.1, 2.2 later.
> 
> Thanks,
> Wenchen
> 
> On Wed, May 23, 2018 at 9:54 PM, Sean Owen 

> srowen@

>  wrote:
> 
>> +1 Same result for me as with RC1.
>>
>>
>> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 

> vanzin@

> 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.1.
>>>
>>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
>>> at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
>>> https://github.com/apache/spark/tree/v2.3.1-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1270/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>>>
>>> The list of bug fixes going into 2.3.1 can be found at the following
>>> URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.1?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.3.1 can be found at:
>>> https://s.apache.org/Q3Uo
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

>>>
>>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Wenchen Fan
We found a critical bug in tungsten that can lead to silent data
corruption: https://github.com/apache/spark/pull/21311

This is a long-standing bug that starts with Spark 2.0(not a regression),
but since we are going to release 2.3.1, I think it's a good chance to
include this fix.

We will also backport this fix to Spark 2.0, 2.1, 2.2, and then we can
discuss if we should do a new release for 2.0, 2.1, 2.2 later.

Thanks,
Wenchen

On Wed, May 23, 2018 at 9:54 PM, Sean Owen  wrote:

> +1 Same result for me as with RC1.
>
>
> On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.1.
>>
>> The vote is open until Friday, May 25, at 20:00 UTC and passes if
>> at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.3.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
>> https://github.com/apache/spark/tree/v2.3.1-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1270/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>>
>> The list of bug fixes going into 2.3.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.1?
>> ===
>>
>> The current list of open tickets targeted at 2.3.1 can be found at:
>> https://s.apache.org/Q3Uo
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Sean Owen
+1 Same result for me as with RC1.

On Tue, May 22, 2018 at 2:45 PM Marcelo Vanzin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.3.1.
>
> The vote is open until Friday, May 25, at 20:00 UTC and passes if
> at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.1-rc2 (commit 93258d80):
> https://github.com/apache/spark/tree/v2.3.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1270/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc2-docs/
>
> The list of bug fixes going into 2.3.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.1?
> ===
>
> The current list of open tickets targeted at 2.3.1 can be found at:
> https://s.apache.org/Q3Uo
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>