Re: V2.3 Scala API to Github Links Incorrect

2018-04-15 Thread Sameer Agarwal
[+Hyukjin]

Thanks for flagging this Jayesh. https://github.com/apache/spar
k-website/pull/111 is tracking a short term fix to the API docs and
https://issues.apache.org/jira/browse/SPARK-23732 tracks the fix to the
release scripts.

Regards,
Sameer


On 15 April 2018 at 18:50, Thakrar, Jayesh 
wrote:

> In browsing through the API docs, the links to Github source code seem to
> be pointing to a dev branch rather than the release branch.
>
>
>
> Here's one example
>
> Go to the API doc page below and click on the "ProcessingTime.scala" link
> which points to Sameer's dev branch.
>
> http://spark.apache.org/docs/latest/api/scala/index.html#
> org.apache.spark.sql.streaming.ProcessingTime
>
>
>
> https://github.com/apache/spark/tree/v2.3.0/Users/
> sameera/dev/spark/sql/core/src/main/scala/org/apache/spark/sql/streaming/
> ProcessingTime.scala
>
>
>
> Any chance this can be corrected please?
>
>
>
> BTW, I know working and executing on a release is an arduous task, so
> thanks for all the effort, Sameer and the dev/release team and contributors!
>
>
>
> Thanks,
>
> Jayesh
>
>
>


Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Sameer Agarwal
Congratulations Zhenhua -- well deserved!

On 1 April 2018 at 22:36, sujith chacko <sujithchacko.2...@gmail.com> wrote:

> Congratulations zhenhua for this great achievement.
>
> On Mon, 2 Apr 2018 at 11:05 AM, Denny Lee <denny.g@gmail.com> wrote:
>
>> Awesome - congrats Zhenhua!
>>
>> On Sun, Apr 1, 2018 at 10:33 PM 叶先进 <advance...@gmail.com> wrote:
>>
>>> Big congs.
>>>
>>> > On Apr 2, 2018, at 1:28 PM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > The Spark PMC recently added Zhenhua Wang as a committer on the
>>> project. Zhenhua is the major contributor of the CBO project, and has been
>>> contributing across several areas of Spark for a while, focusing especially
>>> on analyzer, optimizer in Spark SQL. Please join me in welcoming Zhenhua!
>>> >
>>> > Wenchen
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


-- 
Sameer Agarwal
Computer Science | UC Berkeley
http://cs.berkeley.edu/~sameerag


Re: Welcoming some new committers

2018-03-03 Thread Sameer Agarwal
Congratulations!!

On 3 March 2018 at 13:12, Mridul Muralidharan  wrote:

> Congratulations !
>
>
> Regards,
> Mridul
>
>
> On Fri, Mar 2, 2018 at 2:41 PM, Matei Zaharia 
> wrote:
> > Hi everyone,
> >
> > The Spark PMC has recently voted to add several new committers to the
> project, based on their contributions to Spark 2.3 and other past work:
> >
> > - Anirudh Ramanathan (contributor to Kubernetes support)
> > - Bryan Cutler (contributor to PySpark and Arrow support)
> > - Cody Koeninger (contributor to streaming and Kafka support)
> > - Erik Erlandson (contributor to Kubernetes support)
> > - Matt Cheah (contributor to Kubernetes support and other parts of Spark)
> > - Seth Hendrickson (contributor to MLlib and PySpark)
> >
> > Please join me in welcoming Anirudh, Bryan, Cody, Erik, Matt and Seth as
> committers!
> >
> > Matei
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[ANNOUNCE] Announcing Apache Spark 2.3.0

2018-02-28 Thread Sameer Agarwal
Hi all,

Apache Spark 2.3.0 is the fourth major release in the 2.x line. This
release adds support for continuous processing in structured streaming
along with a brand new Kubernetes scheduler backend. Other major updates
include the new data source and structured streaming v2 APIs, a standard
image schema and built-in support for reading images, better custom
Transformer support in Python and a number of PySpark performance
enhancements. In addition, this release continues to focus on usability,
stability, and polish while resolving around 1400 tickets.

We'd like to thank our contributors and users for their contributions and
early feedback to this release. This release would not have been possible
without you.

To download Spark 2.3.0, head over to the download page:
http://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-2-3-0.html

Regards,
Sameer

PS: If you see any issues with the release notes, webpage or published
artifacts, please contact me directly off-list


Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-27 Thread Sameer Agarwal
This vote passes! I'll follow up with a formal release announcement soon.

+1:
Wenchen Fan (binding)
Takuya Ueshin
Xingbo Jiang
Gengliang Wang
Weichen Xu
Sean Owen (binding)
Josh Goldsborough
Denny Lee
Nicholas Chammas
Marcelo Vanzin (binding)
Holden Karau (binding)
Cheng Lian (binding)
Bryan Cutler
Hyukjin Kwon
Ricardo Almeida
Xiao Li (binding)
Ryan Blue
Dongjoon Hyun
Michael Armbrust (binding)
Nan Zhu
Felix Cheung (binding)
Nick Pentreath (binding)

+0: None

-1: None

On 27 February 2018 at 00:21, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> +1 (binding)
>
> Built and ran Scala tests with "-Phadoop-2.6 -Pyarn -Phive", all passed.
>
> Python tests passed (also including pyspark-streaming w/kafka-0.8 and
> flume packages built)
>
>
> On Tue, 27 Feb 2018 at 10:09 Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>> +1
>>
>> Tested R:
>>
>> install from package, CRAN tests, manual tests, help check, vignettes
>> check
>>
>> Filed this https://issues.apache.org/jira/browse/SPARK-23461
>> This is not a regression so not a blocker of the release.
>>
>> Tested this on win-builder and r-hub. On r-hub on multiple platforms
>> everything passed. For win-builder tests failed on x86 but passed x64 -
>> perhaps due to an intermittent download issue causing a gzip error,
>> re-testing now but won’t hold the release on this.
>>
>> --
>> *From:* Nan Zhu <zhunanmcg...@gmail.com>
>> *Sent:* Monday, February 26, 2018 4:03:22 PM
>> *To:* Michael Armbrust
>> *Cc:* dev
>> *Subject:* Re: [VOTE] Spark 2.3.0 (RC5)
>>
>> +1  (non-binding), tested with internal workloads and benchmarks
>>
>> On Mon, Feb 26, 2018 at 12:09 PM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> +1 all our pipelines have been running the RC for several days now.
>>>
>>> On Mon, Feb 26, 2018 at 10:33 AM, Dongjoon Hyun <dongjoon.h...@gmail.com
>>> > wrote:
>>>
>>>> +1 (non-binding).
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>>
>>>> On Mon, Feb 26, 2018 at 9:14 AM, Ryan Blue <rb...@netflix.com.invalid>
>>>> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> On Sat, Feb 24, 2018 at 4:17 PM, Xiao Li <gatorsm...@gmail.com> wrote:
>>>>>
>>>>>> +1 (binding) in Spark SQL, Core and PySpark.
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>> 2018-02-24 14:49 GMT-08:00 Ricardo Almeida <
>>>>>> ricardo.alme...@actnowib.com>:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> same as previous RC
>>>>>>>
>>>>>>> On 24 February 2018 at 11:10, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> 2018-02-24 16:57 GMT+09:00 Bryan Cutler <cutl...@gmail.com>:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>> Tests passed and additionally ran Arrow related tests and did some
>>>>>>>>> perf checks with python 2.7.14
>>>>>>>>>
>>>>>>>>> On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau <
>>>>>>>>> hol...@pigscanfly.ca> wrote:
>>>>>>>>>
>>>>>>>>>> Note: given the state of Jenkins I'd love to see Bryan Cutler or
>>>>>>>>>> someone with Arrow experience sign off on this release.
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 23, 2018 at 6:13 PM, Cheng Lian <
>>>>>>>>>> lian.cs@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>
>>>>>>>>>>> Passed all the tests, looks good.
>>>>>>>>>>>
>>>>>>>>>>> Cheng
>>>>>>>>>>>
>>>>>>>>>>> On 2/23/18 15:00, Holden Karau wrote:
>>>>>>>>>>>
>>>>>>>>>>> +1 (binding)
>>>>>>>>>>> PySpark artifacts install in a fresh Py3 virtual env
>>>>>&g

[VOTE] Spark 2.3.0 (RC5)

2018-02-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Tuesday February 27, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc5:
https://github.com/apache/spark/tree/v2.3.0-rc5
(992447fb30ee9ebb3cf794f2d06f4d63a2d792db)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1266/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
Sure, please feel free to backport.

On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com> wrote:

> Hey Sameer,
>
> Mind including https://github.com/apache/spark/pull/20643
> (SPARK-23468)  in the new RC? It's a minor bug since I've only hit it
> with older shuffle services, but it's pretty safe.
>
> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <samee...@apache.org>
> wrote:
> > This RC has failed due to https://issues.apache.org/
> jira/browse/SPARK-23470.
> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
> > with an RC5 soon.
> >
> > On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com> wrote:
> >>
> >> +1
> >>
> >> Build & tests look fine, checked signature and checksums for src
> tarball.
> >>
> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu
> >> <shixi...@databricks.com> wrote:
> >>>
> >>> I'm -1 because of the UI regression
> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All Jobs page
> may be
> >>> too slow and cause "read timeout" when there are lots of jobs and
> stages.
> >>> This is one of the most important pages because when it's broken, it's
> >>> pretty hard to use Spark Web UI.
> >>>
> >>>
> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <marcogaid...@gmail.com>
> >>> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com>:
> >>>>>
> >>>>> +1 too
> >>>>>
> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <ues...@happy-camper.st>:
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang <
> jiangxb1...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> +1
> >>>>>>>
> >>>>>>>
> >>>>>>> Wenchen Fan <cloud0...@gmail.com>于2018年2月20日 周二下午1:09写道:
> >>>>>>>>
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin <
> r...@databricks.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal
> >>>>>>>>> <sameer.a...@gmail.com>, wrote:
> >>>>>>>>>>
> >>>>>>>>>> this file shouldn't be included?
> >>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
> spark-parent_2.11.iml
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I've now deleted this file
> >>>>>>>>>
> >>>>>>>>>> From: Sameer Agarwal <sameer.a...@gmail.com>
> >>>>>>>>>> Sent: Saturday, February 17, 2018 1:43:39 PM
> >>>>>>>>>> To: Sameer Agarwal
> >>>>>>>>>> Cc: dev
> >>>>>>>>>> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
> >>>>>>>>>>
> >>>>>>>>>> I'll start with a +1 once again.
> >>>>>>>>>>
> >>>>>>>>>> All blockers reported against RC3 have been resolved and the
> >>>>>>>>>> builds are healthy.
> >>>>>>>>>>
> >>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal <
> samee...@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Please vote on releasing the following candidate as Apache
> Spark
> >>>>>>>>>>> version 2.3.0. The vote is open until Thursday February 22,
> 2018 at 8:00:00
> >>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes are
> cast.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Sameer Agarwal
This RC has failed due to https://issues.apache.org/jira/browse/SPARK-23470.
Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll follow up
with an RC5 soon.

On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com> wrote:

> +1
>
> Build & tests look fine, checked signature and checksums for src tarball.
>
> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> I'm -1 because of the UI regression https://issues.apac
>> he.org/jira/browse/SPARK-23470 : the All Jobs page may be too slow and
>> cause "read timeout" when there are lots of jobs and stages. This is one of
>> the most important pages because when it's broken, it's pretty hard to use
>> Spark Web UI.
>>
>>
>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido <marcogaid...@gmail.com>
>> wrote:
>>
>>> +1
>>>
>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com>:
>>>
>>>> +1 too
>>>>
>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN <ues...@happy-camper.st>:
>>>>
>>>>> +1
>>>>>
>>>>>
>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang <jiangxb1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>>
>>>>>> Wenchen Fan <cloud0...@gmail.com>于2018年2月20日 周二下午1:09写道:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin <r...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal <
>>>>>>>> sameer.a...@gmail.com>, wrote:
>>>>>>>>
>>>>>>>> this file shouldn't be included? https://dist.apache.org/repos/
>>>>>>>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>>>>>>>>
>>>>>>>>
>>>>>>>> I've now deleted this file
>>>>>>>>
>>>>>>>> *From:* Sameer Agarwal <sameer.a...@gmail.com>
>>>>>>>>> *Sent:* Saturday, February 17, 2018 1:43:39 PM
>>>>>>>>> *To:* Sameer Agarwal
>>>>>>>>> *Cc:* dev
>>>>>>>>> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>>>>>>>>>
>>>>>>>>> I'll start with a +1 once again.
>>>>>>>>>
>>>>>>>>> All blockers reported against RC3 have been resolved and the
>>>>>>>>> builds are healthy.
>>>>>>>>>
>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal <samee...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>>>>> version 2.3.0. The vote is open until Thursday February 22, 2018 at 
>>>>>>>>>> 8:00:00
>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>>>>>>>>
>>>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>>>> https://spark.apache.org/
>>>>>>>>>>
>>>>>>>>>> The tag to be voted on is v2.3.0-rc4:
>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>>>>>>>>
>>>>>>>>>> List of JIRA tickets resolved in this release can be found here:
>>>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>>>>>>>>
>>>>>>>>>> The release files, including signatures, digests, etc. can be
>>>>>>>>

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Sameer Agarwal
>
> this file shouldn't be included? https://dist.apache.org/repos/
> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>

I've now deleted this file

*From:* Sameer Agarwal <sameer.a...@gmail.com>
> *Sent:* Saturday, February 17, 2018 1:43:39 PM
> *To:* Sameer Agarwal
> *Cc:* dev
> *Subject:* Re: [VOTE] Spark 2.3.0 (RC4)
>
> I'll start with a +1 once again.
>
> All blockers reported against RC3 have been resolved and the builds are
> healthy.
>
> On 17 February 2018 at 13:41, Sameer Agarwal <samee...@apache.org> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc4: https://github.com/apache/spar
>> k/tree/v2.3.0-rc4 (44095cb65500739695b0324c177c19dfa1471472)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1265/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs
>> /_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>
>
>
> --
> Sameer Agarwal
> Computer Science | UC Berkeley
> http://cs.berkeley.edu/~sameerag
>



-- 
Sameer Agarwal
Computer Science | UC Berkeley
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-17 Thread Sameer Agarwal
I'll start with a +1 once again.

All blockers reported against RC3 have been resolved and the builds are
healthy.

On 17 February 2018 at 13:41, Sameer Agarwal <samee...@apache.org> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
> and passes if a majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc4: https://github.com/apache/
> spark/tree/v2.3.0-rc4 (44095cb65500739695b0324c177c19dfa1471472)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1265/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-
> docs/_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing, there are
> currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install the
> current RC and see if anything important breaks, in the Java/Scala you can
> add the staging repository to your projects resolvers and test with the RC
> (make sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
> appropriate.
>
> ===
> Why is my bug not fixed?
> ===
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.2.0. That being said, if
> there is something which is a regression from 2.2.0 and has not been
> correctly targeted please ping me or a committer to help target the issue
> (you can see the open issues listed as impacting Spark 2.3.0 at
> https://s.apache.org/WmoI).
>



-- 
Sameer Agarwal
Computer Science | UC Berkeley
http://cs.berkeley.edu/~sameerag


[VOTE] Spark 2.3.0 (RC4)

2018-02-17 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Thursday February 22, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc4:
https://github.com/apache/spark/tree/v2.3.0-rc4
(44095cb65500739695b0324c177c19dfa1471472)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1265/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc4-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Sameer Agarwal
In addition to the issues mentioned above, Wenchen and Xiao have flagged
two other regressions (https://issues.apache.org/jira/browse/SPARK-23316
and https://issues.apache.org/jira/browse/SPARK-23388) that were merged
after RC3 was cut.

Due to these, this vote fails. I'll follow-up with an RC4 in a day (this
will probably also give us enough time to resolve
https://issues.apache.org/jira/browse/SPARK-23381 and
https://issues.apache.org/jira/browse/SPARK-23410).


On 15 February 2018 at 17:22, mrkm4ntr  wrote:

> I agree that this is not a blocker against RC3.  It was not appropriate as
> a
> vote for RC3.
> There is no problem if it is in time for release 2.3.0.
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-13 Thread Sameer Agarwal
The issue with SPARK-23292 is that we currently run the python tests
related to pandas and pyarrow with python 3 (which is already installed on
all amplab jenkins machines). Since the code path is fully tested, we
decided to not mark it as a blocker; I've reworded the title to better
indicate that.

On 13 February 2018 at 08:16, Sean Owen <sro...@apache.org> wrote:

> +1 from me. Again, licenses and sigs look fine. I built the source
> distribution with "-Phive -Phadoop-2.7 -Pyarn -Pkubernetes" and all tests
> passed.
>
> Remaining issues for 2.3.0, none of which are a Blocker:
>
> SPARK-22797 Add multiple column support to PySpark Bucketizer
> SPARK-23083 Adding Kubernetes as an option to https://spark.apache.org/
> SPARK-23292 python tests related to pandas are skipped
> SPARK-23309 Spark 2.3 cached query performance 20-30% worse then spark 2.2
> SPARK-23316 AnalysisException after max iteration reached for IN query
>
> ... though the pandas tests issue is "Critical".
>
> (SPARK-23083 is an update to the main site that should happen as the
> artifacts are released, so it's OK.)
>
> On Tue, Feb 13, 2018 at 12:30 AM Sameer Agarwal <samee...@apache.org>
> wrote:
>
>> Now that all known blockers have once again been resolved, please vote on
>> releasing the following candidate as Apache Spark version 2.3.0. The vote
>> is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
>> majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc3: https://github.com/apache/
>> spark/tree/v2.3.0-rc3 (89f6fcbafcfb0a7aeb897fba6036cb085bd35121)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1264/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-
>> docs/_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>>
>> Regards,
>> Sameer
>>
>


-- 
Sameer Agarwal
Computer Science | UC Berkeley
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
I'll start the vote with a +1.

As of today, all known release blockers and QA tasks have been resolved,
and the jenkins builds are healthy:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/

On 12 February 2018 at 22:30, Sameer Agarwal <samee...@apache.org> wrote:

> Now that all known blockers have once again been resolved, please vote on
> releasing the following candidate as Apache Spark version 2.3.0. The vote
> is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
> majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc3: https://github.com/apache/
> spark/tree/v2.3.0-rc3 (89f6fcbafcfb0a7aeb897fba6036cb085bd35121)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1264/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-
> docs/_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing, there are
> currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install the
> current RC and see if anything important breaks, in the Java/Scala you can
> add the staging repository to your projects resolvers and test with the RC
> (make sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
> appropriate.
>
> ===
> Why is my bug not fixed?
> ===
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.2.0. That being said, if
> there is something which is a regression from 2.2.0 and has not been
> correctly targeted please ping me or a committer to help target the issue
> (you can see the open issues listed as impacting Spark 2.3.0 at
> https://s.apache.org/WmoI).
>
>
> Regards,
> Sameer
>


[VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
Now that all known blockers have once again been resolved, please vote on
releasing the following candidate as Apache Spark version 2.3.0. The vote
is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc3:
https://github.com/apache/spark/tree/v2.3.0-rc3
(89f6fcbafcfb0a7aeb897fba6036cb085bd35121)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1264/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Regards,
Sameer


Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-06 Thread Sameer Agarwal
FYI -- Thanks to a big community-wide effort over the last few days, we're
now down to just one last remaining code blocker again:
https://issues.apache.org/jira/browse/SPARK-23309

I'll cut an RC3 as soon as that's resolved.

On 4 February 2018 at 00:02, Xingbo Jiang <jiangxb1...@gmail.com> wrote:

> I filed another NPE problem in WebUI, I believe this is regression in 2.3:
> https://issues.apache.org/jira/browse/SPARK-23330
>
> 2018-02-01 10:38 GMT-08:00 Tom Graves <tgraves...@yahoo.com.invalid>:
>
>> I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not
>> working - ASF JIRA <https://issues.apache.org/jira/browse/SPARK-23304> for
>> the coalesce issue.
>>
>> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA
>>
>> <https://issues.apache.org/jira/browse/SPARK-23304>
>>
>>
>> Tom
>>
>> On Thursday, February 1, 2018, 12:36:02 PM CST, Sameer Agarwal <
>> samee...@apache.org> wrote:
>>
>>
>> [+ Xiao]
>>
>> SPARK-23290  does sound like a blocker. On the SQL side, I can confirm
>> that there were non-trivial changes around repartitioning/coalesce and
>> cache performance in 2.3 --  we're currently investigating these.
>>
>> On 1 February 2018 at 10:02, Andrew Ash <and...@andrewash.com> wrote:
>>
>> I'd like to nominate SPARK-23290
>> <https://issues.apache.org/jira/browse/SPARK-23290> as a potential
>> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
>> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
>> return type of date columns changed from object to datetime64[ns].  My
>> understanding of the Spark Versioning Policy
>> <http://spark.apache.org/versioning-policy.html> is that user code
>> should continue to run in future versions of Spark with the same major
>> version number.
>>
>> Thanks!
>>
>> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves <tgraves...@yahoo.com.invalid>
>> wrote:
>>
>>
>> Testing with spark 2.3 and I see a difference in the sql coalesce talking
>> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>>
>> Query:
>> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
>> '20170301' AND dt <= '20170331' AND something IS NOT
>> NULL").coalesce(16).show()
>>
>> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>>  Anyone know about this issue or are there some weird config changes,
>> otherwise I'll file a jira?
>>
>> Note I also see a performance difference when reading cached data. Spark
>> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
>> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
>> from hive (orc) seems better though.
>>
>> Tom
>>
>>
>>
>> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
>> heue...@gmail.com> wrote:
>>
>>
>> We found two classes new to Spark 2.3.0 that must be registered in Kryo
>> for our tests to pass on RC2
>>
>> org.apache.spark.sql.execution .datasources.BasicWriteTaskSta ts
>> org.apache.spark.sql.execution .datasources.ExecutedWriteSumm ary
>>
>> https://github.com/bigdatageno mics/adam/pull/1897
>> <https://github.com/bigdatagenomics/adam/pull/1897>
>>
>> Perhaps a mention in release notes?
>>
>>michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstanding.
>>
>>
>> On Thu, 1 Feb 2018 at 06:21 Yin Huai <yh...@databricks.com> wrote:
>>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal <samee...@apache.org>
>> wrote:
>>
>> Just a quick status update on RC3 -- SPARK-23274
>> <https://issues.apache.org/jira/browse/SPARK-23274> was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> <https://issues.apache.org/jira/browse/SPARK-23202>) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash <and...@andrewas

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Sameer Agarwal
[+ Xiao]

SPARK-23290  does sound like a blocker. On the SQL side, I can confirm that
there were non-trivial changes around repartitioning/coalesce and cache
performance in 2.3 --  we're currently investigating these.

On 1 February 2018 at 10:02, Andrew Ash <and...@andrewash.com> wrote:

> I'd like to nominate SPARK-23290
> <https://issues.apache.org/jira/browse/SPARK-23290> as a potential
> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
> return type of date columns changed from object to datetime64[ns].  My
> understanding of the Spark Versioning Policy
> <http://spark.apache.org/versioning-policy.html> is that user code should
> continue to run in future versions of Spark with the same major version
> number.
>
> Thanks!
>
> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves <tgraves...@yahoo.com.invalid>
> wrote:
>
>>
>> Testing with spark 2.3 and I see a difference in the sql coalesce talking
>> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>>
>> Query:
>> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
>> '20170301' AND dt <= '20170331' AND something IS NOT
>> NULL").coalesce(16).show()
>>
>> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>>  Anyone know about this issue or are there some weird config changes,
>> otherwise I'll file a jira?
>>
>> Note I also see a performance difference when reading cached data. Spark
>> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
>> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
>> from hive (orc) seems better though.
>>
>> Tom
>>
>>
>>
>> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
>> heue...@gmail.com> wrote:
>>
>>
>> We found two classes new to Spark 2.3.0 that must be registered in Kryo
>> for our tests to pass on RC2
>>
>> org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
>> org.apache.spark.sql.execution.datasources.ExecutedWriteSummary
>>
>> https://github.com/bigdatagenomics/adam/pull/1897
>>
>> Perhaps a mention in release notes?
>>
>>michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstanding.
>>
>>
>> On Thu, 1 Feb 2018 at 06:21 Yin Huai <yh...@databricks.com> wrote:
>>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal <samee...@apache.org>
>> wrote:
>>
>> Just a quick status update on RC3 -- SPARK-23274
>> <https://issues.apache.org/jira/browse/SPARK-23274> was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> <https://issues.apache.org/jira/browse/SPARK-23202>) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash <and...@andrewash.com> wrote:
>>
>> I'd like to nominate SPARK-23274
>> <https://issues.apache.org/jira/browse/SPARK-23274> as a potential
>> blocker for the 2.3.0 release as well, due to being a regression from
>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>> in prior releases but now fails with an exception in the catalyst optimizer.
>>
>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal <sameer.a...@gmail.com>
>> wrote:
>>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache. org/oXKi <https://s.apache.org/oXKi>
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal <sameer.a...@gmail.com>
>> wrote:
>>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by 
>> https://issues.apache.org/
>> jira/browse/SPARK-23055
>> <https://issues.apache.org/jira/browse/SPARK-23055> ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it was
>> subsequently re-fixed for 2.3.0 or not. It says it's

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Sameer Agarwal
Just a quick status update on RC3 -- SPARK-23274
<https://issues.apache.org/jira/browse/SPARK-23274> was resolved yesterday
and tests have been quite healthy throughout this week and the last. I'll
cut the new RC as soon as the remaining blocker (SPARK-23202
<https://issues.apache.org/jira/browse/SPARK-23202>) is resolved.


On 30 January 2018 at 10:12, Andrew Ash <and...@andrewash.com> wrote:

> I'd like to nominate SPARK-23274
> <https://issues.apache.org/jira/browse/SPARK-23274> as a potential
> blocker for the 2.3.0 release as well, due to being a regression from
> 2.2.0.  The ticket has a simple repro included, showing a query that works
> in prior releases but now fails with an exception in the catalyst optimizer.
>
> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal <sameer.a...@gmail.com>
> wrote:
>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache.org/oXKi
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal <sameer.a...@gmail.com>
>> wrote:
>>
>>>
>>> Most tests pass on RC2, except I'm still seeing the timeout caused by
>>>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>>>> finish. I followed the thread a bit further and wasn't clear whether it was
>>>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>>>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I
>>>> am still seeing these tests fail or hang:
>>>>
>>>> - subscribing topic by name from earliest offsets (failOnDataLoss:
>>>> false)
>>>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>>>
>>>
>>> Sean, while some of these tests were timing out on RC1, we're not aware
>>> of any known issues in RC2. Both maven (https://amplab.cs.berkeley.ed
>>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
>>> nch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spar
>>> k.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.ed
>>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
>>> nch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spar
>>> k.sql.kafka010/history/) historical builds on jenkins
>>> for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
>>> seeing timeouts in RC2, can you create a JIRA with any applicable build/env
>>> info?
>>>
>>>
>>>
>>>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>>>>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>>>>> without warnings in either case.
>>>>>
>>>>> I am encountering errors while running the tests, different ones each
>>>>> time, so am still figuring out whether there is a real problem or just
>>>>> flaky tests.
>>>>>
>>>>> These issues look like blockers, as they are inherently to be
>>>>> completed before the 2.3 release. They are mostly not done. I suppose I'd
>>>>> -1 on behalf of those who say this needs to be done first, though, we can
>>>>> keep testing.
>>>>>
>>>>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>>>>> SPARK-23114 Spark R 2.3 QA umbrella
>>>>>
>>>>> Here are the remaining items targeted for 2.3:
>>>>>
>>>>> SPARK-15689 Data source API v2
>>>>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>>>>> SPARK-21646 Add new type coercion rules to compatible with Hive
>>>>> SPARK-22386 Data Source V2 improvements
>>>>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>>>>> SPARK-22735 Add VectorSizeHint to ML features documentation
>>>>> SPARK-22739 Additional Expression Support for Objects
>>>>> SPARK-22809 pyspark is sensitive to imports with dots
>>>>> SPARK-22820 Spark 2.3 SQL API audit
>>>>>
>>>>>
>>>>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin <van...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> +0
>>>>>>
>>>>>> Signatures check out. Code compiles, although I see the errors in [1]
>>>>>> when untarr

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-26 Thread Sameer Agarwal
This vote has failed due to a number of aforementioned blockers. I'll
follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
resolved: https://s.apache.org/oXKi


On 25 January 2018 at 12:59, Sameer Agarwal <sameer.a...@gmail.com> wrote:

>
> Most tests pass on RC2, except I'm still seeing the timeout caused by
>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it was
>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
>> still seeing these tests fail or hang:
>>
>> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>
> Sean, while some of these tests were timing out on RC1, we're not aware of
> any known issues in RC2. Both maven (https://amplab.cs.berkeley.
> edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/
> spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.
> apache.spark.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.
> edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/
> spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.
> apache.spark.sql.kafka010/history/) historical builds on jenkins
> for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
> seeing timeouts in RC2, can you create a JIRA with any applicable build/env
> info?
>
>
>
>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen <so...@cloudera.com> wrote:
>>
>>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>>> without warnings in either case.
>>>
>>> I am encountering errors while running the tests, different ones each
>>> time, so am still figuring out whether there is a real problem or just
>>> flaky tests.
>>>
>>> These issues look like blockers, as they are inherently to be completed
>>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>>> behalf of those who say this needs to be done first, though, we can keep
>>> testing.
>>>
>>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>>> SPARK-23114 Spark R 2.3 QA umbrella
>>>
>>> Here are the remaining items targeted for 2.3:
>>>
>>> SPARK-15689 Data source API v2
>>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>>> SPARK-21646 Add new type coercion rules to compatible with Hive
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>>> SPARK-22735 Add VectorSizeHint to ML features documentation
>>> SPARK-22739 Additional Expression Support for Objects
>>> SPARK-22809 pyspark is sensitive to imports with dots
>>> SPARK-22820 Spark 2.3 SQL API audit
>>>
>>>
>>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>
>>>> +0
>>>>
>>>> Signatures check out. Code compiles, although I see the errors in [1]
>>>> when untarring the source archive; perhaps we should add "use GNU tar"
>>>> to the RM checklist?
>>>>
>>>> Also ran our internal tests and they seem happy.
>>>>
>>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>>> documentation ones). It is not long, but it seems some of those need
>>>> to be looked at. It would be nice for the committers who are involved
>>>> in those bugs to take a look.
>>>>
>>>> [1] https://superuser.com/questions/318809/linux-os-x-tar-
>>>> incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>>>
>>>>
>>>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal <samee...@apache.org>
>>>> wrote:
>>>> > Please vote on releasing the following candidate as Apache Spark
>>>> version
>>>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>>>> UTC and
>>>> > passes if a majority of at least 3 PMC +1 votes are cast.
>>>> >
>>>> >
>>>> > [ ] +1 Release this package as Apache Spark 2.3.0
>>>> >
>>>> > [ ] -1 Do not release this package because ...
>>>> >
>>>> >
>>>> > To learn more about Apache Spark, please see
>>>

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
> Most tests pass on RC2, except I'm still seeing the timeout caused by
> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
> still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>

Sean, while some of these tests were timing out on RC1, we're not aware of
any known issues in RC2. Both maven (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/)
and sbt (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/)
historical builds on jenkins for org.apache.spark.sql.kafka010 look fairly
healthy. If you're still seeing timeouts in RC2, can you create a JIRA with
any applicable build/env info?



> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen <so...@cloudera.com> wrote:
>
>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>> without warnings in either case.
>>
>> I am encountering errors while running the tests, different ones each
>> time, so am still figuring out whether there is a real problem or just
>> flaky tests.
>>
>> These issues look like blockers, as they are inherently to be completed
>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>> behalf of those who say this needs to be done first, though, we can keep
>> testing.
>>
>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>> SPARK-23114 Spark R 2.3 QA umbrella
>>
>> Here are the remaining items targeted for 2.3:
>>
>> SPARK-15689 Data source API v2
>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>> SPARK-21646 Add new type coercion rules to compatible with Hive
>> SPARK-22386 Data Source V2 improvements
>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>> SPARK-22735 Add VectorSizeHint to ML features documentation
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22820 Spark 2.3 SQL API audit
>>
>>
>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>
>>> +0
>>>
>>> Signatures check out. Code compiles, although I see the errors in [1]
>>> when untarring the source archive; perhaps we should add "use GNU tar"
>>> to the RM checklist?
>>>
>>> Also ran our internal tests and they seem happy.
>>>
>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>> documentation ones). It is not long, but it seems some of those need
>>> to be looked at. It would be nice for the committers who are involved
>>> in those bugs to take a look.
>>>
>>> [1] https://superuser.com/questions/318809/linux-os-x-
>>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>>
>>>
>>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal <samee...@apache.org>
>>> wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>>> UTC and
>>> > passes if a majority of at least 3 PMC +1 votes are cast.
>>> >
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.3.0
>>> >
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> >
>>> > To learn more about Apache Spark, please see https://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.3.0-rc2:
>>> > https://github.com/apache/spark/tree/v2.3.0-rc2
>>> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>> >
>>> > List of JIRA tickets resolved in this release can be found here:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>> >
>>> > Release arti

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
I'm a -1 too. In addition to SPARK-23207
<https://issues.apache.org/jira/browse/SPARK-23207>, we've recently merged
two codegen fixes (SPARK-23208
<https://issues.apache.org/jira/browse/SPARK-23208> and SPARK-21717
<https://issues.apache.org/jira/browse/SPARK-21717>) that address a major
code-splitting bug and performance regressions respectively.

Regarding QA tasks, I think it goes without saying that all QA
pre-requisites are by-definition "release blockers" and an RC will not pass
until all of them are resolved. Traditionally for every major Spark
release, we've seen that serious QA only starts once an RC is cut, but if
the community feels otherwise, I'm happy to hold off the next RC until all
these QA JIRAs are resolved. Otherwise, I'll follow up with an RC3 once
SPARK-23207 <https://issues.apache.org/jira/browse/SPARK-23207> and
SPARK-23209 <https://issues.apache.org/jira/browse/SPARK-23209> are
resolved.

On 25 January 2018 at 10:17, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> I think this has come up before (and Sean mentions it above), but the
> sub-items on:
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>
> are actually marked as Blockers, but are not targeted to 2.3.0. I think
> they should be, and I'm not comfortable with those not being resolved
> before voting positively on the release.
>
> So I'm -1 too for that reason.
>
> I think most of those review items are close to done, and there is also
> https://issues.apache.org/jira/browse/SPARK-22799 that I think should be
> in for 2.3 (to avoid a behavior change later between 2.3.0 and 2.3.1,
> especially since we'll have another RC now it seems).
>
>
> On Thu, 25 Jan 2018 at 19:28 Marcelo Vanzin <van...@cloudera.com> wrote:
>
>> Sorry, have to change my vote again. Hive guys ran into SPARK-23209
>> and that's a regression we need to fix. I'll post a patch soon. So -1
>> (although others have already -1'ed).
>>
>> On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>> > Given that the bugs I was worried about have been dealt with, I'm
>> > upgrading to +1.
>> >
>> > On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>> >> +0
>> >>
>> >> Signatures check out. Code compiles, although I see the errors in [1]
>> >> when untarring the source archive; perhaps we should add "use GNU tar"
>> >> to the RM checklist?
>> >>
>> >> Also ran our internal tests and they seem happy.
>> >>
>> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> >> documentation ones). It is not long, but it seems some of those need
>> >> to be looked at. It would be nice for the committers who are involved
>> >> in those bugs to take a look.
>> >>
>> >> [1] https://superuser.com/questions/318809/linux-os-x-
>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>> >>
>> >>
>> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal <samee...@apache.org>
>> wrote:
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version
>> >>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>> UTC and
>> >>> passes if a majority of at least 3 PMC +1 votes are cast.
>> >>>
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.0
>> >>>
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.0-rc2:
>> >>> https://github.com/apache/spark/tree/v2.3.0-rc2
>> >>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>> >>>
>> >>> List of JIRA tickets resolved in this release can be found here:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>> >>>
>> >>> Release artifacts are signed with the following key:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>> https://repository.apache.org/content/repositories/
>> orgapa

[VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
passes if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc2:
https://github.com/apache/spark/tree/v2.3.0-rc2
(489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1262/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Regards,
Sameer


Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-18 Thread Sameer Agarwal
This vote has failed in favor of a new RC. I'll follow up with a new RC2 as
soon as the 3 remaining test/UI blockers <https://s.apache.org/oXKi> are
resolved.

On 17 January 2018 at 16:38, Sameer Agarwal <sameer.a...@gmail.com> wrote:

> Thanks, will do!
>
> On 16 January 2018 at 22:09, Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> So looking at http://pgp.mit.edu/pks/lookup?op=vindex=0xA1CEDBA8
>> AD0C022A it seems like Sameer's key isn't in the Apache web of trust
>> yet. This shouldn't block RC process but before we publish it's important
>> to get the key in the Apache web of trust.
>>
>> On Tue, Jan 16, 2018 at 3:00 PM, Sameer Agarwal <sameer.a...@gmail.com>
>> wrote:
>>
>>> Yes, I'll cut an RC2 as soon as the remaining blockers are resolved. In
>>> the meantime, please continue to report any other issues here.
>>>
>>> Here's a quick update on progress towards the next RC:
>>>
>>> - SPARK-22908 <https://issues.apache.org/jira/browse/SPARK-22908>
>>> (KafkaContiniousSourceSuite) has been reverted
>>> - SPARK-23051 <https://issues.apache.org/jira/browse/SPARK-23051>
>>> (Spark UI), SPARK-23063
>>> <https://issues.apache.org/jira/browse/SPARK-23063> (k8s packaging) and
>>> SPARK-23065 <https://issues.apache.org/jira/browse/SPARK-23065> (R API
>>> docs) have all been resolved
>>> - A fix for SPARK-23020
>>> <https://issues.apache.org/jira/browse/SPARK-23020> (SparkLauncherSuite)
>>> has been merged. We're monitoring the builds to make sure that the
>>> flakiness has been resolved.
>>>
>>>
>>>
>>> On 16 January 2018 at 13:21, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> Is there going to be another RC ?
>>>>
>>>> With KafkaContinuousSourceSuite hanging, it is hard to get the rest of
>>>> the tests going.
>>>>
>>>> Cheers
>>>>
>>>> On Sat, Jan 13, 2018 at 7:29 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>
>>>>> The signatures and licenses look OK. Except for the missing k8s
>>>>> package, the contents look OK. Tests look pretty good with "-Phive
>>>>> -Phadoop-2.7 -Pyarn" on Ubuntu 17.10, except that 
>>>>> KafkaContinuousSourceSuite
>>>>> seems to hang forever. That was just fixed and needs to get into an RC?
>>>>>
>>>>> Aside from the Blockers just filed for R docs, etc., we have:
>>>>>
>>>>> Blocker:
>>>>> SPARK-23000 Flaky test suite DataSourceWithHiveMetastoreCatalogSuite
>>>>> in Spark 2.3
>>>>> SPARK-23020 Flaky Test: org.apache.spark.launcher.Spar
>>>>> kLauncherSuite.testInProcessLauncher
>>>>> SPARK-23051 job description in Spark UI is broken
>>>>>
>>>>> Critical:
>>>>> SPARK-22739 Additional Expression Support for Objects
>>>>>
>>>>> I actually don't think any of those Blockers should be Blockers; not
>>>>> sure if the last one is really critical either.
>>>>>
>>>>> I think this release will have to be re-rolled so I'd say -1 to RC1.
>>>>>
>>>>> On Fri, Jan 12, 2018 at 4:42 PM Sameer Agarwal <samee...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 2.3.0. The vote is open until Thursday January 18, 2018 at 
>>>>>> 8:00:00
>>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>>>>>
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>>>>
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>>
>>>>>> To learn more about Apache Spark, please see
>>>>>> https://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v2.3.0-rc1: https://github.com/apache/spar
>>>>>> k/tree/v2.3.0-rc1 (964cc2e31b2862bca0bd968b3e9e2cbf8d3ba5ea)
>>>>>>
>>>>>> List of JIRA tickets resolved in this release can be found here:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>

Re: Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-17 Thread Sameer Agarwal
FYI, I ended up bumping the build timeouts from 255 to 275 minutes. All
successful
<https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/>
2.3
(hadoop-2.7) builds last week were already taking 245-250 mins and had
started timing out earlier today (towards the very end; while making
consistent progress throughout). Increasing the timeout resolves the issue.

NB: This might be either due to additional tests that were recently added
or due to the git delays that Shane reported; we haven't investigated the
root cause yet.

On 12 January 2018 at 16:37, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

> For this issue, during SPARK-23028, Shane shared that the server limit is
> already higher.
>
> 1. Xiao Li increased the timeout of Spark test script for `master` branch
> first in the following commit.
>
> [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT
> <https://github.com/apache/spark/commit/651f76153f5e9b185aaf593161d40cabe7994fea>
>
> 2. Marco Gaido reports a flaky test suite and it turns out that the test
> suite hangs in SPARK-23055
> <https://issues.apache.org/jira/browse/SPARK-23055>
>
> 3. Sameer Agarwal swiftly reverts it.
>
> Thank you all!
>
> Let's wait and see the dashboard
> <https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/>
> .
>
> Bests,
> Dongjoon.
>
>
>
> On Fri, Jan 12, 2018 at 3:22 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> FYI, we reverted a commit in https://github.com/apache/spar
>> k/commit/55dbfbca37ce4c05f83180777ba3d4fe2d96a02e to fix the issue.
>>
>> On Fri, Jan 12, 2018 at 11:45 AM, Xin Lu <x...@salesforce.com> wrote:
>>
>>> seems like someone should investigate what caused the build time to go
>>> up an hour and if it's expected or not.
>>>
>>> On Thu, Jan 11, 2018 at 7:37 PM, Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Hi, All and Shane.
>>>>
>>>> Can we increase the build time for `branch-2.3` during 2.3 RC period?
>>>>
>>>> There are two known test issues, but the Jenkins on branch-2.3 with
>>>> hadoop-2.7 fails with build timeout. So, it's difficult to monitor whether
>>>> the branch is healthy or not.
>>>>
>>>> Build timed out (after 255 minutes). Marking the build as aborted.
>>>> Build was aborted
>>>> ...
>>>> Finished: ABORTED
>>>>
>>>> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Tes
>>>> t%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/60/console
>>>> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Tes
>>>> t%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/47/console
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>
>>>
>>
>


Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-17 Thread Sameer Agarwal
Thanks, will do!

On 16 January 2018 at 22:09, Holden Karau <hol...@pigscanfly.ca> wrote:

> So looking at http://pgp.mit.edu/pks/lookup?op=vindex=0xA1CEDBA8
> AD0C022A it seems like Sameer's key isn't in the Apache web of trust yet.
> This shouldn't block RC process but before we publish it's important to get
> the key in the Apache web of trust.
>
> On Tue, Jan 16, 2018 at 3:00 PM, Sameer Agarwal <sameer.a...@gmail.com>
> wrote:
>
>> Yes, I'll cut an RC2 as soon as the remaining blockers are resolved. In
>> the meantime, please continue to report any other issues here.
>>
>> Here's a quick update on progress towards the next RC:
>>
>> - SPARK-22908 <https://issues.apache.org/jira/browse/SPARK-22908>
>> (KafkaContiniousSourceSuite) has been reverted
>> - SPARK-23051 <https://issues.apache.org/jira/browse/SPARK-23051> (Spark
>> UI), SPARK-23063 <https://issues.apache.org/jira/browse/SPARK-23063>
>> (k8s packaging) and SPARK-23065
>> <https://issues.apache.org/jira/browse/SPARK-23065> (R API docs) have
>> all been resolved
>> - A fix for SPARK-23020
>> <https://issues.apache.org/jira/browse/SPARK-23020> (SparkLauncherSuite)
>> has been merged. We're monitoring the builds to make sure that the
>> flakiness has been resolved.
>>
>>
>>
>> On 16 January 2018 at 13:21, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Is there going to be another RC ?
>>>
>>> With KafkaContinuousSourceSuite hanging, it is hard to get the rest of
>>> the tests going.
>>>
>>> Cheers
>>>
>>> On Sat, Jan 13, 2018 at 7:29 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> The signatures and licenses look OK. Except for the missing k8s
>>>> package, the contents look OK. Tests look pretty good with "-Phive
>>>> -Phadoop-2.7 -Pyarn" on Ubuntu 17.10, except that 
>>>> KafkaContinuousSourceSuite
>>>> seems to hang forever. That was just fixed and needs to get into an RC?
>>>>
>>>> Aside from the Blockers just filed for R docs, etc., we have:
>>>>
>>>> Blocker:
>>>> SPARK-23000 Flaky test suite DataSourceWithHiveMetastoreCatalogSuite
>>>> in Spark 2.3
>>>> SPARK-23020 Flaky Test: org.apache.spark.launcher.Spar
>>>> kLauncherSuite.testInProcessLauncher
>>>> SPARK-23051 job description in Spark UI is broken
>>>>
>>>> Critical:
>>>> SPARK-22739 Additional Expression Support for Objects
>>>>
>>>> I actually don't think any of those Blockers should be Blockers; not
>>>> sure if the last one is really critical either.
>>>>
>>>> I think this release will have to be re-rolled so I'd say -1 to RC1.
>>>>
>>>> On Fri, Jan 12, 2018 at 4:42 PM Sameer Agarwal <samee...@apache.org>
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00
>>>>> am UTC and passes if a majority of at least 3 PMC +1 votes are cast.
>>>>>
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>>>
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>>
>>>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.3.0-rc1: https://github.com/apache/spar
>>>>> k/tree/v2.3.0-rc1 (964cc2e31b2862bca0bd968b3e9e2cbf8d3ba5ea)
>>>>>
>>>>> List of JIRA tickets resolved in this release can be found here:
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-bin/
>>>>>
>>>>> Release artifacts are signed with the following key:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>> spark-1261/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-docs
>>>>> /_site/i

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-16 Thread Sameer Agarwal
Yes, I'll cut an RC2 as soon as the remaining blockers are resolved. In the
meantime, please continue to report any other issues here.

Here's a quick update on progress towards the next RC:

- SPARK-22908 <https://issues.apache.org/jira/browse/SPARK-22908>
(KafkaContiniousSourceSuite) has been reverted
- SPARK-23051 <https://issues.apache.org/jira/browse/SPARK-23051> (Spark
UI), SPARK-23063 <https://issues.apache.org/jira/browse/SPARK-23063> (k8s
packaging) and SPARK-23065
<https://issues.apache.org/jira/browse/SPARK-23065> (R API docs) have all
been resolved
- A fix for SPARK-23020 <https://issues.apache.org/jira/browse/SPARK-23020>
(SparkLauncherSuite) has been merged. We're monitoring the builds to make
sure that the flakiness has been resolved.



On 16 January 2018 at 13:21, Ted Yu <yuzhih...@gmail.com> wrote:

> Is there going to be another RC ?
>
> With KafkaContinuousSourceSuite hanging, it is hard to get the rest of
> the tests going.
>
> Cheers
>
> On Sat, Jan 13, 2018 at 7:29 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> The signatures and licenses look OK. Except for the missing k8s package,
>> the contents look OK. Tests look pretty good with "-Phive -Phadoop-2.7
>> -Pyarn" on Ubuntu 17.10, except that KafkaContinuousSourceSuite seems to
>> hang forever. That was just fixed and needs to get into an RC?
>>
>> Aside from the Blockers just filed for R docs, etc., we have:
>>
>> Blocker:
>> SPARK-23000 Flaky test suite DataSourceWithHiveMetastoreCatalogSuite in
>> Spark 2.3
>> SPARK-23020 Flaky Test: org.apache.spark.launcher.Spar
>> kLauncherSuite.testInProcessLauncher
>> SPARK-23051 job description in Spark UI is broken
>>
>> Critical:
>> SPARK-22739 Additional Expression Support for Objects
>>
>> I actually don't think any of those Blockers should be Blockers; not sure
>> if the last one is really critical either.
>>
>> I think this release will have to be re-rolled so I'd say -1 to RC1.
>>
>> On Fri, Jan 12, 2018 at 4:42 PM Sameer Agarwal <samee...@apache.org>
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00 am UTC
>>> and passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc1: https://github.com/apache/spar
>>> k/tree/v2.3.0-rc1 (964cc2e31b2862bca0bd968b3e9e2cbf8d3ba5ea)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1261/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-docs
>>> /_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala you
>>> can add the staging repository to your projects resolvers and test with the
>>> RC (make sure to clean up the artifact cache before/after so you don't end
>>> up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks tha

[VOTE] Spark 2.3.0 (RC1)

2018-01-12 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00 am UTC
and passes if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc1:
https://github.com/apache/spark/tree/v2.3.0-rc1
(964cc2e31b2862bca0bd968b3e9e2cbf8d3ba5ea)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1261/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-docs/_site/index.html


FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
appropriate.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 that has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of the writing, there are
19 JIRA issues targeting 2.3.0 tracking various QA/audit tasks, test
failures and other feature/bugs. In particular, we've currently marked 3
JIRAs as release blockers that are being actively worked on:

1. SPARK-23051 that tracks a regression in the Spark UI
2. SPARK-23020 and SPARK-23000 that track a couple of flaky tests that are
responsible for build failures. Additionally,
https://github.com/apache/spark/pull/20242 fixes a few Java linter errors
in RC1.

Given that these blockers are fairly isolated, in the sprit of starting a
thorough QA early, this RC1 aims to serve as a good approximation of the
functionality of final release.

Regards,
Sameer


Re: Branch 2.3 is cut

2018-01-11 Thread Sameer Agarwal
All major blockers have now been resolved with the exception of a couple of
known test issues (SPARK-23020
<https://issues.apache.org/jira/browse/SPARK-23020> and SPARK-23000
<https://issues.apache.org/jira/browse/SPARK-23000>) that are being
actively worked on. Unless there is an objection, I'll shortly followup
with an RC to get the QA started in parallel.

Thanks,
Sameer

On Mon, Jan 8, 2018 at 5:03 PM, Sameer Agarwal <sam...@databricks.com>
wrote:

> Hello everyone,
>
> Just a quick update on the release. There are currently 2 correctness
> blockers (SPARK-22984 <https://issues.apache.org/jira/browse/SPARK-22984>
>  and SPARK-22982 <https://issues.apache.org/jira/browse/SPARK-22982>)
> that are targeted against 2.3.0. We'll go ahead and create an RC as soon as
> they're resolved. All relevant jenkins jobs for the release branch can be
> accessed at: https://amplab.cs.berkeley.edu/jenkins/
>
> Regards,
> Sameer
>
> On Mon, Jan 1, 2018 at 5:22 PM, Sameer Agarwal <sam...@databricks.com>
> wrote:
>
>> We've just cut the release branch for Spark 2.3. Committers, please
>> backport all important bug fixes and PRs as appropriate.
>>
>> Next, I'll go ahead and create the jenkins jobs for the release branch
>> and then follow up with an RC early next week.
>>
>
>
>
> --
> Sameer Agarwal
> Software Engineer | Databricks Inc.
> http://cs.berkeley.edu/~sameerag
>



-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: Branch 2.3 is cut

2018-01-08 Thread Sameer Agarwal
Hello everyone,

Just a quick update on the release. There are currently 2 correctness
blockers (SPARK-22984 <https://issues.apache.org/jira/browse/SPARK-22984>
 and SPARK-22982 <https://issues.apache.org/jira/browse/SPARK-22982>) that
are targeted against 2.3.0. We'll go ahead and create an RC as soon as
they're resolved. All relevant jenkins jobs for the release branch can be
accessed at: https://amplab.cs.berkeley.edu/jenkins/

Regards,
Sameer

On Mon, Jan 1, 2018 at 5:22 PM, Sameer Agarwal <sam...@databricks.com>
wrote:

> We've just cut the release branch for Spark 2.3. Committers, please
> backport all important bug fixes and PRs as appropriate.
>
> Next, I'll go ahead and create the jenkins jobs for the release branch and
> then follow up with an RC early next week.
>



-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Branch 2.3 is cut

2018-01-01 Thread Sameer Agarwal
We've just cut the release branch for Spark 2.3. Committers, please
backport all important bug fixes and PRs as appropriate.

Next, I'll go ahead and create the jenkins jobs for the release branch and
then follow up with an RC early next week.


Re: Timeline for Spark 2.3

2017-12-29 Thread Sameer Agarwal
Thanks everyone! I'll then go ahead and cut the branch early next week.
This will give us enough time to set up all the necessary jenkins builders
before the voting starts. We can then start the formal release process a
week after (8th January) once everybody gets back from vacation.

Regards,
Sameer

On Thu, Dec 21, 2017 at 8:48 PM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
wrote:

> +1 for cutting a branch earlier.
> In some Asian countries, 1st, 2nd, and 3rd January are off.
> https://www.timeanddate.com/holidays/
> How about 4th or 5th?
>
> Regards,
> Kazuaki Ishizaki
>
>
>
> From:Felix Cheung <felixcheun...@hotmail.com>
> To:Michael Armbrust <mich...@databricks.com>, Holden Karau <
> hol...@pigscanfly.ca>
> Cc:Sameer Agarwal <sam...@databricks.com>, Erik Erlandson <
> eerla...@redhat.com>, dev <dev@spark.apache.org>
> Date:2017/12/21 04:48
> Subject:Re: Timeline for Spark 2.3
> --
>
>
>
> +1
> I think the earlier we cut a branch the better.
>
> --
>
> *From:* Michael Armbrust <mich...@databricks.com>
> *Sent:* Tuesday, December 19, 2017 4:41:44 PM
> *To:* Holden Karau
> *Cc:* Sameer Agarwal; Erik Erlandson; dev
> *Subject:* Re: Timeline for Spark 2.3
>
> Do people really need to be around for the branch cut (modulo the person
> cutting the branch)?
>
> 1st or 2nd doesn't really matter to me, but I am +1 kicking this off as
> soon as we enter the new year :)
>
> Michael
>
> On Tue, Dec 19, 2017 at 4:39 PM, Holden Karau <*hol...@pigscanfly.ca*
> <hol...@pigscanfly.ca>> wrote:
> Sounds reasonable, although I'd choose the 2nd perhaps just since lots of
> folks are off on the 1st?
>
> On Tue, Dec 19, 2017 at 4:36 PM, Sameer Agarwal <*sam...@databricks.com*
> <sam...@databricks.com>> wrote:
> Let's aim for the 2.3 branch cut on 1st Jan and RC1 a week after that
> (i.e., week of 8th Jan)?
>
>
> On Fri, Dec 15, 2017 at 12:54 AM, Holden Karau <*hol...@pigscanfly.ca*
> <hol...@pigscanfly.ca>> wrote:
> So personally I’d be in favour or pushing to early January, doing a
> release over the holidays is a little rough with herding all of people to
> vote.
>
> On Thu, Dec 14, 2017 at 11:49 PM Erik Erlandson <*eerla...@redhat.com*
> <eerla...@redhat.com>> wrote:
> I wanted to check in on the state of the 2.3 freeze schedule.  Original
> proposal was "late Dec", which is a bit open to interpretation.
>
> We are working to get some refactoring done on the integration testing for
> the Kubernetes back-end in preparation for testing upcoming release
> candidates, however holiday vacation time is about to begin taking its toll
> both on upstream reviewing and on the "downstream" spark-on-kube fork.
>
> If the freeze pushed into January, that would take some of the pressure
> off the kube back-end upstreaming. However, regardless, I was wondering if
> the dates could be clarified.
> Cheers,
> Erik
>
>
> On Mon, Nov 13, 2017 at 5:13 PM, *dji...@dataxu.com* <dji...@dataxu.com><
> *dji...@dataxu.com* <dji...@dataxu.com>> wrote:
> Hi,
>
> What is the process to request an issue/fix to be included in the next
> release? Is there a place to vote for features?
> I am interested in *https://issues.apache.org/jira/browse/SPARK-13127*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D13127=DwMF-g=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=Pf-JJof66PoQGQUgW6qFA_cpH0Awhw47UCeiS_yKk4w=T5EQWQG8BY6A-LfNsjYaCuQPbXyTRFvv232dlVq73E0=>,
> to see
> if we can get Spark upgrade parquet to 1.9.0, which addresses the
> *https://issues.apache.org/jira/browse/PARQUET-686*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_PARQUET-2D686=DwMF-g=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=Pf-JJof66PoQGQUgW6qFA_cpH0Awhw47UCeiS_yKk4w=TcCl94jLWekyKQhsXmW3je6cZV-Ag2hirypA6AE25zA=>
> .
> Can we include the fix in Spark 2.3 release?
>
> Thanks,
>
> Dong
>
>
>
> --
> Sent from: *http://apache-spark-developers-list.1001551.n3.nabble.com/*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_=DwMF-g=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=Pf-JJof66PoQGQUgW6qFA_cpH0Awhw47UCeiS_yKk4w=x7nQABsJc1-X1KVoEk74111pvRsCGFHNVwbQbNxe08U=>
>
> ---------
> To unsubscribe e-mail: *dev-unsubscr...@spark.apache.org*
> <dev-unsubscr...@spark.apache.org>
>
>

Re: Timeline for Spark 2.3

2017-12-19 Thread Sameer Agarwal
Let's aim for the 2.3 branch cut on 1st Jan and RC1 a week after that
(i.e., week of 8th Jan)?


On Fri, Dec 15, 2017 at 12:54 AM, Holden Karau <hol...@pigscanfly.ca> wrote:

> So personally I’d be in favour or pushing to early January, doing a
> release over the holidays is a little rough with herding all of people to
> vote.
>
> On Thu, Dec 14, 2017 at 11:49 PM Erik Erlandson <eerla...@redhat.com>
> wrote:
>
>> I wanted to check in on the state of the 2.3 freeze schedule.  Original
>> proposal was "late Dec", which is a bit open to interpretation.
>>
>> We are working to get some refactoring done on the integration testing
>> for the Kubernetes back-end in preparation for testing upcoming release
>> candidates, however holiday vacation time is about to begin taking its toll
>> both on upstream reviewing and on the "downstream" spark-on-kube fork.
>>
>> If the freeze pushed into January, that would take some of the pressure
>> off the kube back-end upstreaming. However, regardless, I was wondering if
>> the dates could be clarified.
>> Cheers,
>> Erik
>>
>>
>> On Mon, Nov 13, 2017 at 5:13 PM, dji...@dataxu.com <dji...@dataxu.com>
>> wrote:
>>
>>> Hi,
>>>
>>> What is the process to request an issue/fix to be included in the next
>>> release? Is there a place to vote for features?
>>> I am interested in https://issues.apache.org/jira/browse/SPARK-13127,
>>> to see
>>> if we can get Spark upgrade parquet to 1.9.0, which addresses the
>>> https://issues.apache.org/jira/browse/PARQUET-686.
>>> Can we include the fix in Spark 2.3 release?
>>>
>>> Thanks,
>>>
>>> Dong
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>



-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: Timeline for Spark 2.3

2017-11-10 Thread Sameer Agarwal
Sounds good to me too. In addition to what has already been pointed out
about the Spark History Server and the Kubernetes support, this would also
give us enough time to further polish the new data source v2 API and the
vectorized UDF API to iron out any kinks.

I'd like to volunteer to serve as the release manager for Spark 2.3. In
terms of bandwidth, I will be available during this holiday season as I
have no vacation planed during the Dec-Jan timeframe. And I'm fairly
familiar with most of the major efforts targeted for the 2.3 release.

Thanks,
Sameer

On Fri, Nov 10, 2017 at 2:07 AM, Sean Owen <so...@cloudera.com> wrote:

> The original timeline was just +6 months from last planned release, so
> there was nothing too magic about it. That was pushed from +4 . The only
> risk here is that an extra month becomes 2, 3, and so users aren't getting
> the other 1000 fixes. But no particular problem with moving it back.
>
> On Thu, Nov 9, 2017, 5:54 PM Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> According to the timeline posted on the website, we are nearing branch
>> cut for Spark 2.3.  I'd like to propose pushing this out towards mid to
>> late December for a couple of reasons and would like to hear what people
>> think.
>>
>> 1. I've done release management during the Thanksgiving / Christmas time
>> before and in my experience, we don't actually get a lot of testing during
>> this time due to vacations and other commitments. I think beginning the RC
>> process in early January would give us the best coverage in the shortest
>> amount of time.
>> 2. There are several large initiatives in progress that given a little
>> more time would leave us with a much more exciting 2.3 release.
>> Specifically, the work on the history server, Kubernetes and continuous
>> processing
>> 3. Given the actual release date of Spark 2.2, I think we'll still get
>> Spark 2.3 out roughly 6 months after.
>>
>> Thoughts?
>>
>> Michael
>>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Sameer Agarwal
Congratulations!

On Sat, Sep 30, 2017 at 7:51 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
wrote:

> Congratulation Tejas!
>
> Kazuaki Ishizaki
>
>
>
> From:Matei Zaharia <matei.zaha...@gmail.com>
> To:"dev@spark.apache.org" <dev@spark.apache.org>
> Date:2017/09/30 04:58
> Subject:Welcoming Tejas Patil as a Spark committer
> --
>
>
>
> Hi all,
>
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing especially on scalability issues and SQL. Please
> join me in welcoming Tejas!
>
> Matei
>
> ---------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Sameer Agarwal
ut value for 0-parameter 
>>>>>>> UDFs.
>>>>>>> The return value should be pandas.Series of the specified type and
>>>>>>> the length of the returned value should be the same as input value.
>>>>>>>
>>>>>>> We can define vectorized UDFs as:
>>>>>>>
>>>>>>>   @pandas_udf(DoubleType())
>>>>>>>   def plus(v1, v2):
>>>>>>>   return v1 + v2
>>>>>>>
>>>>>>> or we can define as:
>>>>>>>
>>>>>>>   plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType())
>>>>>>>
>>>>>>> We can use it similar to row-by-row UDFs:
>>>>>>>
>>>>>>>   df.withColumn('sum', plus(df.v1, df.v2))
>>>>>>>
>>>>>>> As for 0-parameter UDFs, we can define and use as:
>>>>>>>
>>>>>>>   @pandas_udf(LongType())
>>>>>>>   def f0(size):
>>>>>>>   return pd.Series(1).repeat(size)
>>>>>>>
>>>>>>>   df.select(f0())
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The vote will be up for the next 72 hours. Please reply with your
>>>>>>> vote:
>>>>>>>
>>>>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>>>>> +0: Don't really care.
>>>>>>> -1: I don't think this is a good idea because of the following technical
>>>>>>> reasons.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> --
>>>>>>> Takuya UESHIN
>>>>>>> Tokyo, Japan
>>>>>>>
>>>>>>> http://twitter.com/ueshin
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Takuya UESHIN
>>>>> Tokyo, Japan
>>>>>
>>>>> http://twitter.com/ueshin
>>>>>
>>>>
>>>
>>
>>
>> --
>> Takuya UESHIN
>> Tokyo, Japan
>>
>> http://twitter.com/ueshin
>>
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-06 Thread Sameer Agarwal
+1

On Wed, Sep 6, 2017 at 8:53 PM, Xiao Li <gatorsm...@gmail.com> wrote:

> +1
>
> Xiao
>
> 2017-09-06 19:37 GMT-07:00 Wenchen Fan <cloud0...@gmail.com>:
>
>> adding my own +1 (binding)
>>
>> On Thu, Sep 7, 2017 at 10:29 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> In the previous discussion, we decided to split the read and write path
>>> of data source v2 into 2 SPIPs, and I'm sending this email to call a vote
>>> for Data Source V2 read path only.
>>>
>>> The full document of the Data Source API V2 is:
>>> https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ
>>> -Z8qU5Frf6WMQZ6jJVM/edit
>>>
>>> The ready-for-review PR that implements the basic infrastructure for the
>>> read path is:
>>> https://github.com/apache/spark/pull/19136
>>>
>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>
>>> +1: Yeah, let's go forward and implement the SPIP.
>>> +0: Don't really care.
>>> -1: I don't think this is a good idea because of the following technical
>>> reasons.
>>>
>>> Thanks!
>>>
>>
>>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-08 Thread Sameer Agarwal
Thanks all. It's really humbling to be part of such an innovative community!

On Tue, Aug 8, 2017 at 7:49 AM, Dongjin Lee <dong...@apache.org> wrote:

> Congratulations!!
>
> +1. for Jacek's comment. I also hope Sean would feel a weight off. :)
>
> On Tue, Aug 8, 2017 at 10:24 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> Congrats!! Looks like Sean is gonna be less busy these days ;-)
>>
>> Jacek
>>
>> On 7 Aug 2017 5:53 p.m., "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as
>>> committers. Join me in congratulating both of them and thanking them for
>>> their contributions to the project!
>>>
>>> Matei
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: 
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>



-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-03 Thread Sameer Agarwal
+1

On Mon, Jul 3, 2017 at 6:08 AM, Wenchen Fan <cloud0...@gmail.com> wrote:

> +1
>
> On 3 Jul 2017, at 8:22 PM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
> +1 (binding)
>
> On Mon, 3 Jul 2017 at 11:53 Yanbo Liang <yblia...@gmail.com> wrote:
>
>> +1
>>
>> On Mon, Jul 3, 2017 at 5:35 AM, Herman van Hövell tot Westerflier <
>> hvanhov...@databricks.com> wrote:
>>
>>> +1
>>>
>>> On Sun, Jul 2, 2017 at 11:32 PM, Ricardo Almeida <
>>> ricardo.alme...@actnowib.com> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn
>>>> -Phive -Phive-thriftserver -Pscala-2.11 on
>>>>
>>>>- macOS 10.12.5 Java 8 (build 1.8.0_131)
>>>>- Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 1 Jul 2017 02:45, "Michael Armbrust" <mich...@databricks.com> wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00
>>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>>
>>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>>
>>>> The tag to be voted on is v2.2.0-rc6
>>>> <https://github.com/apache/spark/tree/v2.2.0-rc6> (a2c7b2133cfee7f
>>>> a9abfaa2bfbfb637155466783)
>>>>
>>>> List of JIRA tickets resolved can be found with this filter
>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>
>>>> .
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc6-bin/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1245/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://people.apache.org/~pwendell/spark-releases/spark-
>>>> 2.2.0-rc6-docs/
>>>>
>>>>
>>>> *FAQ*
>>>>
>>>> *How can I help test this release?*
>>>>
>>>> If you are a Spark user, you can help us test this release by taking an
>>>> existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>>
>>>> *But my bug isn't fixed!??!*
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from 2.1.1.
>>>>
>>>>
>>>>
>>>
>>>
>>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: Will higher order functions in spark SQL be pushed upstream?

2017-06-09 Thread Sameer Agarwal
>
> * As a heavy user of complex data types I was wondering if there was
> any plan to push those changes upstream?
>

Yes, we intend to contribute this to open source.


> * In addition, I was wondering if as part of this change it also tries
> to solve the column pruning / filter pushdown issues with complex
> datatypes?


For parquet, this effort is primarily tracked via SPARK-4502 (see
https://github.com/apache/spark/pull/16578) and is currently targeted for
2.3.


Re: Uploading PySpark 2.1.1 to PyPi

2017-05-12 Thread Sameer Agarwal
Holden,

Thanks again for pushing this forward! Out of curiosity, did we get an
approval from the PyPi folks?

Regards,
Sameer

On Mon, May 8, 2017 at 11:44 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> So I have a PR to add this to the release process documentation - I'm
> waiting on the necessary approvals from PyPi folks before I merge that
> incase anything changes as a result of the discussion (like uploading to
> the legacy host or something). As for conda-forge, it's not something we
> need to do, but I'll add a note about pinging them when we make a new
> release so their users can keep up to date easily. The parent JIRA for PyPi
> related tasks is SPARK-18267 :)
>
>
> On Mon, May 8, 2017 at 6:22 PM cloud0fan <cloud0...@gmail.com> wrote:
>
>> Hi Holden,
>>
>> Thanks for working on it! Do we have a JIRA ticket to track this? We
>> should
>> make it part of the release process in all the following Spark releases,
>> and
>> it will be great if we have a JIRA ticket to record the detailed steps of
>> doing this and even automate it.
>>
>> Thanks,
>> Wenchen
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-developers
>> -list.1001551.n3.nabble.com/Uploading-PySpark-2-1-1-to-
>> PyPi-tp21531p21532.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Sameer Agarwal
+1

On Thu, Sep 29, 2016 at 12:04 PM, Sean Owen <so...@cloudera.com> wrote:

> +1 from me too, same result as my RC3 vote/testing.
>
> On Wed, Sep 28, 2016 at 10:14 PM, Reynold Xin <r...@databricks.com> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and passes
> if a
> > majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.0.1
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v2.0.1-rc4
> > (933d2c1ea4e5f5c4ec8d375b5ccaa4577ba4be38)
> >
> > This release candidate resolves 301 issues:
> > https://s.apache.org/spark-2.0.1-jira
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1203/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc4-docs/
> >
> >
> > Q: How can I help test this release?
> > A: If you are a Spark user, you can help us test this release by taking
> an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 2.0.0.
> >
> > Q: What justifies a -1 vote for this release?
> > A: This is a maintenance release in the 2.0.x series.  Bugs already
> present
> > in 2.0.0, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
> > Q: What fix version should I use for patches merging into branch-2.0 from
> > now on?
> > A: Please mark the fix version as 2.0.2, rather than 2.0.1. If a new RC
> > (i.e. RC5) is cut, I will change the fix version of those patches to
> 2.0.1.
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Sameer Agarwal
t;>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The tag to be voted on is v2.0.1-rc3
> >>>>>>>>>>> (9d28cc10357a8afcfb2fa2e6eecb5c2cc2730d17)
> >>>>>>>>>>>
> >>>>>>>>>>> This release candidate resolves 290 issues:
> >>>>>>>>>>> https://s.apache.org/spark-2.0.1-jira
> >>>>>>>>>>>
> >>>>>>>>>>> The release files, including signatures, digests, etc. can be
> >>>>>>>>>>> found at:
> >>>>>>>>>>>
> >>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-
> 2.0.1-rc3-bin/
> >>>>>>>>>>>
> >>>>>>>>>>> Release artifacts are signed with the following key:
> >>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc
> >>>>>>>>>>>
> >>>>>>>>>>> The staging repository for this release can be found at:
> >>>>>>>>>>>
> >>>>>>>>>>> https://repository.apache.org/content/repositories/
> orgapachespark-1201/
> >>>>>>>>>>>
> >>>>>>>>>>> The documentation corresponding to this release can be found
> at:
> >>>>>>>>>>>
> >>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-
> 2.0.1-rc3-docs/
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Q: How can I help test this release?
> >>>>>>>>>>> A: If you are a Spark user, you can help us test this release
> by
> >>>>>>>>>>> taking an existing Spark workload and running on this release
> candidate,
> >>>>>>>>>>> then reporting any regressions from 2.0.0.
> >>>>>>>>>>>
> >>>>>>>>>>> Q: What justifies a -1 vote for this release?
> >>>>>>>>>>> A: This is a maintenance release in the 2.0.x series.  Bugs
> >>>>>>>>>>> already present in 2.0.0, missing features, or bugs related to
> new features
> >>>>>>>>>>> will not necessarily block this release.
> >>>>>>>>>>>
> >>>>>>>>>>> Q: What fix version should I use for patches merging into
> >>>>>>>>>>> branch-2.0 from now on?
> >>>>>>>>>>> A: Please mark the fix version as 2.0.2, rather than 2.0.1. If
> a
> >>>>>>>>>>> new RC (i.e. RC4) is cut, I will change the fix version of
> those patches to
> >>>>>>>>>>> 2.0.1.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sameer Agarwal
+1

On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta <saru...@oss.nttdata.co.jp>
wrote:

> +1 (non-binding)
>
> On 2016/06/23 4:53, Reynold Xin wrote:
>
> +1 myself
>
>
> On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara <
> sean.mcnam...@webtrends.com> wrote:
>
>> +1
>>
>> On Jun 22, 2016, at 1:14 PM, Michael Armbrust <mich...@databricks.com>
>> wrote:
>>
>> +1
>>
>> On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly <jonathaka...@gmail.com>
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter < <timhun...@databricks.com>
>>> timhun...@databricks.com> wrote:
>>>
>>>> +1 This release passes all tests on the graphframes and tensorframes
>>>> packages.
>>>>
>>>> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger < <c...@koeninger.org>
>>>> c...@koeninger.org> wrote:
>>>>
>>>>> If we're considering backporting changes for the 0.8 kafka
>>>>> integration, I am sure there are people who would like to get
>>>>>
>>>>> https://issues.apache.org/jira/browse/SPARK-10963
>>>>>
>>>>> into 1.6.x as well
>>>>>
>>>>> On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen < <so...@cloudera.com>
>>>>> so...@cloudera.com> wrote:
>>>>> > Good call, probably worth back-porting, I'll try to do that. I don't
>>>>> > think it blocks a release, but would be good to get into a next RC if
>>>>> > any.
>>>>> >
>>>>> > On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins <
>>>>> <robbin...@gmail.com>robbin...@gmail.com> wrote:
>>>>> >> This has failed on our 1.6 stream builds regularly.
>>>>> >> ( <https://issues.apache.org/jira/browse/SPARK-6005>
>>>>> https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
>>>>> >>
>>>>> >> On Wed, 22 Jun 2016 at 11:15 Sean Owen < <so...@cloudera.com>
>>>>> so...@cloudera.com> wrote:
>>>>> >>>
>>>>> >>> Oops, one more in the "does anybody else see this" department:
>>>>> >>>
>>>>> >>> - offset recovery *** FAILED ***
>>>>> >>>   recoveredOffsetRanges.forall(((or:
>>>>> (org.apache.spark.streaming.Time,
>>>>> >>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>>> >>>
>>>>> >>>
>>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>>> >>>
>>>>> >>>
>>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>>> >>>
>>>>> >>>
>>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>>>>> >>> was false Recovered ranges are not the same as the ones generated
>>>>> >>> (DirectKafkaStreamSuite.scala:301)
>>>>> >>>
>>>>> >>> This actually fails consistently for me too in the Kafka
>>>>> integration
>>>>> >>> code. Not timezone related, I think.
>>>>> >
>>>>> > -
>>>>> > To unsubscribe, e-mail: <dev-unsubscr...@spark.apache.org>
>>>>> dev-unsubscr...@spark.apache.org
>>>>> > For additional commands, e-mail: <dev-h...@spark.apache.org>
>>>>> dev-h...@spark.apache.org
>>>>> >
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: <dev-unsubscr...@spark.apache.org>
>>>>> dev-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: <dev-h...@spark.apache.org>
>>>>> dev-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>
>>
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Sameer Agarwal
$runMain(SparkSubmit.scala:729)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.NegativeArraySizeException
>> at
>> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:61)
>> at
>> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:214)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>> Source)
>> at
>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> at
>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$3$$anon$2.hasNext(WholeStageCodegenExec.scala:386)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at
>> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>> at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>> at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag