Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
We've identified the cause of the change in behavior. It is related to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related functionality was absent from our previous build. The default setting in the current build was causing Spark to attempt to scan all table

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
I want to caution that in testing a build from this morning's branch-2.1 we found that Hive partition pruning was not working. We found that Spark SQL was fetching all Hive table partitions for a very simple query whereas in a build from several weeks ago it was fetching only the required partit

Re: New Optimizer Hint

2017-04-20 Thread Reynold Xin
Doesn't common sub expression elimination address this issue as well? On Thu, Apr 20, 2017 at 6:40 AM Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > Hi Michael, > > This sounds like a good idea. Can you open a JIRA to track this? > > My initial feedback on your proposal w

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Nicholas Chammas
Steve, I think you're a good person to ask about this. Is the below any cause for concern? Or did I perhaps test this incorrectly? Nick On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I had trouble starting up a shell with the AWS package loaded > (spec

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Adam Roberts
+1 (non-binding), looks good Tested on RHEL 7.2, 7.3, CentOS 7.2, Ubuntu 14 04 and 16 04, SUSE 12, x86, IBM Linux on Power and IBM Linux on Z (big-endian) No problems with latest IBM Java, Hadoop 2.7.3 and Scala 2.11.8, no performance concerns to report either (spark-sql-perf and HiBench) Buil

Re: New Optimizer Hint

2017-04-20 Thread Herman van Hövell tot Westerflier
Hi Michael, This sounds like a good idea. Can you open a JIRA to track this? My initial feedback on your proposal would be that you might want to express the no_collapse at the expression level and not at the plan level. HTH On Thu, Apr 20, 2017 at 3:31 PM, Michael Styles wrote: > Hello, > >

New Optimizer Hint

2017-04-20 Thread Michael Styles
Hello, I am in the process of putting together a PR that introduces a new hint called NO_COLLAPSE. This hint is essentially identical to Oracle's NO_MERGE hint. Let me first give an example of why I am proposing this. df1 = sc.sql.createDataFrame([(1, "abc")], ["id", "user_agent"]) df2 = df1.wit

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Felix Cheung
Tested on both Linux and Windows, as package. Found StackOverflowError with ALS on Windows https://issues.apache.org/jira/browse/SPARK-20402 This is part of the R CRAN check to build the vignettes. Very simple, quick and consistent repo on Windows. The exact same code works fine on Linux. Rep