Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Michael Armbrust
tructField("Avg", "double")) >>>> df4 <- gapply( >>>> cols = "Sepal_Length", >>>> irisDF, >>>> function(key, x) { >>>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) &g

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-21 Thread Nick Pentreath
> structField("Avg", "double")) >>> df4 <- gapply( >>> cols = "Sepal_Length", >>> irisDF, >>> function(key, x) { >>> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) >>> }, >>

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-20 Thread Xiao Li
Found another bug about the case preserving of column names of persistent views. This regression was introduced in 2.2. https://issues.apache.org/jira/browse/SPARK-21150 Thanks, Xiao 2017-06-19 8:03 GMT-07:00 Liang-Chi Hsieh : > > I mean it is not a bug has been fixed before

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh
I mean it is not a bug has been fixed before this feature added. Of course kryo serializer with 2000+ partitions are working before this feature. Koert Kuipers wrote > If a feature added recently breaks using kryo serializer with 2000+ > partitions then how can it not be a regression? I mean I

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Koert Kuipers
If a feature added recently breaks using kryo serializer with 2000+ partitions then how can it not be a regression? I mean I use kryo with more than 2000 partitions all the time, and it worked before. Or was I simply not hitting this bug because there are other conditions that also need to be

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh
I think it's not. This is a feature added recently. Hyukjin Kwon wrote > Is this a regression BTW? I am just curious. > > On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" > viirya@ > wrote: > > -1. When using kyro serializer and partition number is greater than 2000. > There seems a NPE issue

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-18 Thread Hyukjin Kwon
Is this a regression BTW? I am just curious. On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" wrote: -1. When using kyro serializer and partition number is greater than 2000. There seems a NPE issue needed to fix. SPARK-21133

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-18 Thread Liang-Chi Hsieh
-1. When using kyro serializer and partition number is greater than 2000. There seems a NPE issue needed to fix. SPARK-21133 - Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context:

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-15 Thread Felix Cheung
y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) }, schema) collect(df4) 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>: Thanks! Will try to setup RHEL/CentOS to test it out _ From:

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Michael Armbrust
is)) >>> schema <- structType(structField("Sepal_Length", "double"), >>> structField("Avg", "double")) >>> df4 <- gapply( >>> cols = "Sepal_Length", >>> irisDF, >>> function(key, x) { >&

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
df4 <- gapply( >> cols = "Sepal_Length", >> irisDF, >> function(key, x) { >> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) >> }, >> schema) >> collect(df4) >> >> >> >> 201

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
external package and unrelated >>> >>> >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning ( >>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) >>> >>> As for CentOS - would it be possible to test against R older than 3

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Hyukjin Kwon
> From: Nick Pentreath <nick.pentre...@gmail.com> > Sent: Tuesday, June 13, 2017 11:38 PM > Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) > To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon < > gurwls...@gmail.com>, dev <dev@spark.

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Felix Cheung
Thanks! Will try to setup RHEL/CentOS to test it out _ From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> Sent: Tuesday, June 13, 2017 11:38 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Felix Cheung <felixcheun

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Nick Pentreath
he same error reported by Nick below. > > _ > From: Hyukjin Kwon <gurwls...@gmail.com> > Sent: Tuesday, June 13, 2017 8:02 PM > > Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) > To: dev <dev@spark.apache.org> > Cc: Sean Owen <so...@cloudera.c

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Felix Cheung
ported by Nick below. _ From: Hyukjin Kwon <gurwls...@gmail.com<mailto:gurwls...@gmail.com>> Sent: Tuesday, June 13, 2017 8:02 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: dev <dev@spark.apache.org<mailto:dev@spark.apache.org>> Cc: Sean Owen

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Hyukjin Kwon
For the test failure on R, I checked: Per https://github.com/apache/spark/tree/v2.2.0-rc4, 1. Windows Server 2012 R2 / R 3.3.1 - passed ( https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4 ) 2. macOS Sierra 10.12.3 / R 3.4.0 - passed 3. macOS Sierra 10.12.3 / R 3.2.3 -

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Xiao Li
-1 Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or earlier. Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 Will fix it soon. Thanks, Xiao Li 2017-06-13 9:39 GMT-07:00 Joseph Bradley : > Re: the QA JIRAs: > Thanks for

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Joseph Bradley
Re: the QA JIRAs: Thanks for discussing them. I still feel they are very helpful; I particularly notice not having to spend a solid 2-3 weeks of time QAing (unlike in earlier Spark releases). One other point not mentioned above: I think they serve as a very helpful reminder/training for the

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Sean Owen
Different errors as in https://issues.apache.org/jira/browse/SPARK-20520 but that's also reporting R test failures. I went back and tried to run the R tests and they passed, at least on Ubuntu 17 / R 3.3. On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath wrote: > All

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Dong Joon Hyun
Hi, Nick. Could you give us more information on your environment like R/JDK/OS? Bests, Dongjoon. From: Nick Pentreath <nick.pentre...@gmail.com> Date: Friday, June 9, 2017 at 1:12 AM To: dev <dev@spark.apache.org> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) All Scala, Python tests

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Felix Cheung
Hmm, that's odd. This test would be in Jenkins too - let me double check _ From: Nick Pentreath <nick.pentre...@gmail.com<mailto:nick.pentre...@gmail.com>> Sent: Friday, June 9, 2017 1:12 AM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: dev <dev@s

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Nick Pentreath
All Scala, Python tests pass. ML QA and doc issues are resolved (as well as R it seems). However, I'm seeing the following test failure on R consistently: https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 On Thu, 8 Jun 2017 at 08:48 Denny Lee wrote: > +1

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Sean Owen
+1 from me. Felix et al indicated that the various "2.2" JIRAs had no further actions. I retargeted most of the other 2.2.0-targeted JIRAs that didn't seem like they're must-do. We have no Blockers and I'm not aware of any changes that must be in the 2.2 release that aren't. These are the only

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-08 Thread Denny Lee
+1 non-binding Tested on macOS Sierra, Ubuntu 16.04 test suite includes various test cases including Spark SQL, ML, GraphFrames, Structured Streaming On Wed, Jun 7, 2017 at 9:40 PM vaquar khan wrote: > +1 non-binding > > Regards, > vaquar khan > > On Jun 7, 2017 4:32

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-07 Thread vaquar khan
+1 non-binding Regards, vaquar khan On Jun 7, 2017 4:32 PM, "Ricardo Almeida" wrote: +1 (non-binding) Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 on - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) -

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-07 Thread Ricardo Almeida
+1 (non-binding) Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 on - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) - macOS 10.12.5 Java 8 (build 1.8.0_131) On 5 June 2017 at 21:14, Michael Armbrust wrote: >

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Dong Joon Hyun
+1 (non-binding) I built and tested on CentOS 7.3.1611 / OpenJDK 1.8.131 / R 3.3.3 with “-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr”. Java/Scala/R tests passed as expected. There are two minor things. 1. For the deprecation documentation issue

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Holden Karau
> We can close this. > > > > _ > From: Sean Owen <so...@cloudera.com> > Sent: Tuesday, June 6, 2017 1:16 AM > Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) > To: Michael Armbrust <mich...@databricks.com> > Cc: <dev@spark.apache.org> > > > > On Tu

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Felix Cheung
All tasks on the R QA umbrella are completed SPARK-20512 We can close this. _ From: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>> Sent: Tuesday, June 6, 2017 1:16 AM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Michael Ar

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
Now, on the subject of (ML) QA JIRAs. >From the ML side, I believe they are required (I think others such as Joseph will agree and in fact have already said as much). Most are marked as Blockers, though of those the Python API coverage is strictly not a Blocker as we will never hold the release

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Nick Pentreath
The website updates for ML QA (SPARK-20507) are not *actually* critical as the project website certainly can be updated separately from the source code guide and is not part of the release to be voted on. In future that particular work item for the QA process could be marked down in priority, and

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Sean Owen
On Tue, Jun 6, 2017 at 1:06 AM Michael Armbrust wrote: > Regarding the readiness of this and previous RCs. I did cut RC1 & RC2 > knowing that they were unlikely to pass. That said, I still think these > early RCs are valuable. I know several users that wanted to test

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Kazuaki Ishizaki
+1 (non-binding) I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for core have passed. $ java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) $

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
Apologies for messing up the https urls. My mistake. I'll try to get it right next time. Regarding the readiness of this and previous RCs. I did cut RC1 & RC2 knowing that they were unlikely to pass. That said, I still think these early RCs are valuable. I know several users that wanted to

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
On the latest Ubuntu, Java 8, with -Phive -Phadoop-2.7 -Pyarn, this passes all tests. It's looking good, pending a double-check on the outstanding JIRA questions. All the hashes and sigs are correct. On Mon, Jun 5, 2017 at 8:15 PM Michael Armbrust wrote: > Please vote

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
(I apologize for going on about this, but I've asked ~4 times: could you make the URLs here in the form email HTTPS URLs? It sounds minor, but we're asking people to verify the integrity of software and hashes, and this is the one case where it is actually important.) The "2.2" JIRAs don't look

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Dong Joon Hyun
, 2017 at 12:51 PM To: Sean Owen <so...@cloudera.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) I commented on that JIRA, I don't think that should block the release. We can support both options long term if this vote

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Michael Armbrust
I commented on that JIRA, I don't think that should block the release. We can support both options long term if this vote passes. Looks like the remaining JIRAs are doc/website updates that can happen after the vote or QA that should be done on this RC. I think we are ready to start testing

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Sean Owen
Xiao opened a blocker on 2.2.0 this morning: SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV I don't see that this should block? We still have 7 Critical issues: SPARK-20520 R streaming tests failed on Windows SPARK-20512 SparkR 2.2 QA: Programming guide, migration