Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Wenchen Fan
see https://issues.apache.org/jira/browse/SPARK-19611 On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau wrote: > Whats the regression this fixed in 2.1 from 2.0? > > On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan > wrote: > >> IIRC, the new

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Armbrust
Yeah, I agree. -1 (binding) This vote fails, and I'll cut a new RC after #17749 is merged. On Mon, Apr 24, 2017 at 12:18 PM, Eric Liang wrote: > -1 (non-binding) > > I also agree with using NEVER_INFER for 2.1.1. The migration

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
Whoops, sorry finger slipped on that last message. It sounds like whatever we do is going to break some existing users (either with the tables by case sensitivity or with the unexpected scan). Personally I agree with Michael Allman on this, I believe we should use INFER_NEVER for 2.1.1. On Mon,

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
It On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman wrote: > The trouble we ran into is that this upgrade was blocking access to our > tables, and we didn't know why. This sounds like a kind of migration > operation, but it was not apparent that this was the case. It took

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Allman
The trouble we ran into is that this upgrade was blocking access to our tables, and we didn't know why. This sounds like a kind of migration operation, but it was not apparent that this was the case. It took an expert examining a stack trace and source code to figure this out. Would a more

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
Whats the regression this fixed in 2.1 from 2.0? On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan wrote: > IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only > scan all table files only once, and write back the inferred schema to > metastore so that we

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-21 Thread Wenchen Fan
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only scan all table files only once, and write back the inferred schema to metastore so that we don't need to do the schema inference again. So technically this will introduce a performance regression for the first query, but

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-21 Thread Michael Armbrust
Thanks for pointing this out, Michael. Based on the conversation on the PR this seems like a risky change to include in a release branch with a default other than NEVER_INFER. +Wenchen? What do you think? On Thu, Apr 20, 2017

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Nicholas Chammas
Steve, I think you're a good person to ask about this. Is the below any cause for concern? Or did I perhaps test this incorrectly? Nick On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I had trouble starting up a shell with the AWS package loaded >

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Denny Lee
Armbrust <mich...@databricks.com>, "dev@spark.apache.org" < > dev@spark.apache.org> > Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3) > > +1 > > On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> +1 (non-bindi

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Dong Joon Hyun
ev@spark.apache.org>" <dev@spark.apache.org<mailto:dev@spark.apache.org>> Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3) +1 On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>> wrote: +1 (non-binding). Ran the hadoop

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Reynold Xin
+1 On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin wrote: > +1 (non-binding). > > Ran the hadoop-2.6 binary against our internal tests and things look good. > > On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust > wrote: > > Please vote on releasing

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Marcelo Vanzin
+1 (non-binding). Ran the hadoop-2.6 binary against our internal tests and things look good. On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.1. The vote is open until Fri, April

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Kazuaki Ishizaki
+1 (non-binding) I tested it on Ubuntu 16.04 and openjdk8 on ppc64le. All of the tests for core have passed.. $ java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) $

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Sean Owen
+1 from me -- this worked unusually smoothly on the first try. Sigs and license and so forth look OK. Tests pass with Java 8, Ubuntu 17, -Phive -Phadoop-2.7 -Pyarn. I had to run the build with -Xss2m to get this test to pass, but it might be somewhat specific to my env somehow: - SPARK-16845:

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-18 Thread Nicholas Chammas
I had trouble starting up a shell with the AWS package loaded (specifically, org.apache.hadoop:hadoop-aws:2.7.3): [NOT FOUND ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms) local-m2-cache: tried