Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Sean Owen
By the way the RC looks good. Sigs and license are OK, tests pass with -Phive -Pyarn -Phadoop-2.7. +1 from me. On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Holden Karau
+1 (non-binding) PySpark packaging issue from the earlier RC seems to have been fixed. On Thu, Apr 27, 2017 at 1:23 PM, Dong Joon Hyun wrote: > +1 > > I’ve got the same result (Scala/R test) on JDK 1.8.0_131 at this time. > > Bests, > Dongjoon. > > From: Reynold Xin

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Joseph Bradley
That's very fair. For my part, I should have been faster to make these JIRAs and get critical dev community QA started when the branch was cut last week. On Thu, Apr 27, 2017 at 2:59 PM, Sean Owen wrote: > That makes sense, but we have an RC, not just a branch. I think

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Sean Owen
That makes sense, but we have an RC, not just a branch. I think we've followed the pattern in http://spark.apache.org/versioning-policy.html in the past. This generally comes before and RC right, because until everything that Must Happen before a release has happened, someone's saying the RC can't

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Joseph Bradley
This is the same thing as ever for MLlib: Once a branch has been cut, we stop merging features. Now that features are not being merged, we can begin QA. I strongly prefer to track QA work in JIRA and to have those items targeted for 2.2. I also believe that certain QA tasks should be blockers;

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Michael Armbrust
All of those look like QA or documentation, which I don't think needs to block testing on an RC (and in fact probably needs an RC to test?). Joseph, please correct me if I'm wrong. It is unlikely this first RC is going to pass, but I wanted to get the ball rolling on testing 2.2. On Thu, Apr 27,

Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Sean Owen
These are still blockers for 2.2: SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs SPARK-20504 ML 2.2 QA: API: Java compatibility, docs SPARK-20503 ML 2.2 QA: API: Python API coverage SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit SPARK-20500 ML, Graph

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Dong Joon Hyun
+1 I’ve got the same result (Scala/R test) on JDK 1.8.0_131 at this time. Bests, Dongjoon. From: Reynold Xin > Date: Thursday, April 27, 2017 at 1:06 PM To: Michael Armbrust >,

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Reynold Xin
+1 On Thu, Apr 27, 2017 at 11:59 AM Michael Armbrust wrote: > I'll also +1 > > On Thu, Apr 27, 2017 at 4:20 AM, Sean Owen wrote: > >> +1 , same result as with the last RC. All checks out for me. >> >> On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Michael Armbrust
I'll also +1 On Thu, Apr 27, 2017 at 4:20 AM, Sean Owen wrote: > +1 , same result as with the last RC. All checks out for me. > > On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust > wrote: > >> Please vote on releasing the following candidate as

[VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.2.0 [ ] -1 Do not release this package because ...

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Sean Owen
+1 , same result as with the last RC. All checks out for me. On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and > passes

Spark reading parquet files behaved differently with number of paths

2017-04-27 Thread Yash Sharma
Hi Fellow Devs, I have noticed the spark parquet reader behaves very differently in the two scenarios over the same data set while: 1. passing a single path to parent path to data, vs 2. passing all the files individually to parquet(paths: String*) The paths has about ~50K files. The first option