spark git commit: [SPARK-9141] [SQL] Remove project collapsing from DataFrame API

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 34dcf1010 - 23d982204 [SPARK-9141] [SQL] Remove project collapsing from DataFrame API Currently we collapse successive projections that are added by `withColumn`. However, this optimization violates the constraint that adding nodes to a

spark git commit: [SPARK-9141] [SQL] Remove project collapsing from DataFrame API

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 eedb996dd - 125827a4f [SPARK-9141] [SQL] Remove project collapsing from DataFrame API Currently we collapse successive projections that are added by `withColumn`. However, this optimization violates the constraint that adding nodes

spark git commit: [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7a969a696 - 1f8c364b9 [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920 This is a follow-up of https://github.com/apache/spark/pull/7920 to fix comments. Author: Yin Huai yh...@databricks.com Closes #7964 from yhuai/SPARK-9141-follow

spark git commit: [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 03bcf627d - 19018d542 [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920 This is a follow-up of https://github.com/apache/spark/pull/7920 to fix comments. Author: Yin Huai yh...@databricks.com Closes #7964 from yhuai/SPARK-9141

spark git commit: [SPARK-9649] Fix flaky test MasterSuite - randomize ports

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master eb5b8f4a6 - 5f0fb6466 [SPARK-9649] Fix flaky test MasterSuite - randomize ports ``` Error Message Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! Stacktrace java.net.BindException: Failed to bind

spark git commit: [SPARK-9649] Fix flaky test MasterSuite - randomize ports

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b8136d7e0 - 05cbf133d [SPARK-9649] Fix flaky test MasterSuite - randomize ports ``` Error Message Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! Stacktrace java.net.BindException: Failed to

spark git commit: [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility

2015-07-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7bbf02f0b - 5363ed715 [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility JIRA: https://issues.apache.org/jira/browse/SPARK-9361 Currently, we call `aggregate.Utils.tryConvert` in many places to

spark git commit: [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d378396f8 - dfe347d2c [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions matters

[2/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
[SPARK-9646] [SQL] Add metrics for all join and aggregate operators This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the

[2/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
[SPARK-9646] [SQL] Add metrics for all join and aggregate operators This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the

[1/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 71460b889 - 767ee1884 http://git-wip-us.apache.org/repos/asf/spark/blob/767ee188/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --

spark git commit: [SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap

2015-08-05 Thread yhuai
from yhuai/unsafeEmptyMap and squashes the following commits: 9727abe [Yin Huai] Address Josh's comments. 34b6f76 [Yin Huai] 1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. Do not spill a InMemorySorter

spark git commit: [SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap

2015-08-05 Thread yhuai
yhuai/unsafeEmptyMap and squashes the following commits: 9727abe [Yin Huai] Address Josh's comments. 34b6f76 [Yin Huai] 1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. Do not spill a InMemorySorter if it is empty

spark git commit: [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 93085c992 - 9f94c85ff [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust This is a follow-up of #7929. We found that Jenkins SBT master build still fails because of the Hadoop shims loading issue. But the failure

spark git commit: [SPARK-9593] [SQL] Fixes Hadoop shims loading

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 c39d5d144 - 11c28a568 [SPARK-9593] [SQL] Fixes Hadoop shims loading This PR is used to workaround CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1. Internally, Hive `ShimLoader` tries to load different versions of Hadoop shims by checking

spark git commit: [SPARK-9632] [SQL] [HOT-FIX] Fix build.

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 2382b483a - b51159def [SPARK-9632] [SQL] [HOT-FIX] Fix build. seems https://github.com/apache/spark/pull/7955 breaks the build. Author: Yin Huai yh...@databricks.com Closes #8001 from yhuai/SPARK-9632-fixBuild and squashes

spark git commit: [SPARK-9632] [SQL] [HOT-FIX] Fix build.

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2eca46a17 - cdd53b762 [SPARK-9632] [SQL] [HOT-FIX] Fix build. seems https://github.com/apache/spark/pull/7955 breaks the build. Author: Yin Huai yh...@databricks.com Closes #8001 from yhuai/SPARK-9632-fixBuild and squashes the following

spark git commit: [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 11c28a568 - cc4c569a8 [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust This is a follow-up of #7929. We found that Jenkins SBT master build still fails because of the Hadoop shims loading issue. But the

spark git commit: [SPARK-9674] Re-enable ignored test in SQLQuerySuite

2015-08-07 Thread yhuai
before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass. JoshRosen yhuai Author: Andrew Or and...@databricks.com Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits: 225eac2 [Andrew Or] Merge branch 'master

spark git commit: [SPARK-9674] Re-enable ignored test in SQLQuerySuite

2015-08-07 Thread yhuai
before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass. JoshRosen yhuai Author: Andrew Or and...@databricks.com Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits: 225eac2 [Andrew Or] Merge branch 'master

spark git commit: [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan

2015-08-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 874b9d855 - 251d1eef4 [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan JIRA: https://issues.apache.org/jira/browse/SPARK-6212 Author: Yijie Shen henry.yijies...@gmail.com Closes #7986 from

spark git commit: [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan

2015-08-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 25c363e93 - 3ca995b78 [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan JIRA: https://issues.apache.org/jira/browse/SPARK-6212 Author: Yijie Shen henry.yijies...@gmail.com Closes #7986 from yjshen/ctas_explain

spark git commit: [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b12f0737f - 1ce5061bb [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions Author: Yijie Shen henry.yijies...@gmail.com Closes #8057 from yjshen/explode_star and

spark git commit: [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e9c36938b - 68ccc6e18 [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions Author: Yijie Shen henry.yijies...@gmail.com Closes #8057 from yjshen/explode_star and squashes

spark git commit: [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a863348fd - 23cf5af08 [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary shuffles.

spark git commit: [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 1ce5061bb - 323d68606 [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary

spark git commit: [SPARK-9743] [SQL] Fixes JSONRelation refreshing

2015-08-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 f75c64b0c - 94b2f5b32 [SPARK-9743] [SQL] Fixes JSONRelation refreshing PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the

spark git commit: [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5a5bbc299 - afa757c98 [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so

spark git commit: [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b7497e3a2 - ec7a4b9b0 [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option,

spark git commit: [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check

2015-07-27 Thread yhuai
/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console Author: Yin Huai yh...@databricks.com Closes #7702 from yhuai/SPARK-9385 and squashes the following commits: 146e6ef [Yin Huai] Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu

spark git commit: [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint.

2015-07-27 Thread yhuai
...@databricks.com Closes #7704 from yhuai/SPARK-9385 and squashes the following commits: 0056359 [Yin Huai] Enable PEP8 but disable installing pylint. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dafe8d85 Tree: http://git-wip

spark git commit: [SPARK-9386] [SQL] Feature flag for metastore partition pruning

2015-07-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8ddfa52c2 - ce89ff477 [SPARK-9386] [SQL] Feature flag for metastore partition pruning Since we have been seeing a lot of failures related to this new feature, lets put it behind a flag and turn it off by default. Author: Michael Armbrust

spark git commit: [SPARK-9082] [SQL] Filter using non-deterministic expressions should not be pushed down

2015-07-22 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b55a36bc3 - 76520955f [SPARK-9082] [SQL] Filter using non-deterministic expressions should not be pushed down Author: Wenchen Fan cloud0...@outlook.com Closes #7446 from cloud-fan/filter and squashes the following commits: 330021e

spark git commit: [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection

2015-07-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 ff5e5f228 - 712e13bba [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection Target file(s) can be hosted on CDN nodes. HTTP/HTTPS redirection must be supported to download these files. Author: Cheng

spark git commit: [SPARK-9082] [SQL] [FOLLOW-UP] use `partition` in `PushPredicateThroughProject`

2015-07-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 26ed22aec - 52ef76de2 [SPARK-9082] [SQL] [FOLLOW-UP] use `partition` in `PushPredicateThroughProject` a follow up of https://github.com/apache/spark/pull/7446 Author: Wenchen Fan cloud0...@outlook.com Closes #7607 from cloud-fan/tmp and

spark git commit: [SPARK-6941] [SQL] Provide a better error message to when inserting into RDD based table

2015-07-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b536d5dc6 - 43dac2c88 [SPARK-6941] [SQL] Provide a better error message to when inserting into RDD based table JIRA: https://issues.apache.org/jira/browse/SPARK-6941 Author: Yijie Shen henry.yijies...@gmail.com Closes #7342 from

spark git commit: [SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation

2015-07-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fb1d06fc2 - 4b5cfc988 [SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation JIRA: https://issues.apache.org/jira/browse/SPARK-8800 Previously, we turn to Java BigDecimal's divide with specified ROUNDING_MODE to

spark git commit: [SPARK-9060] [SQL] Revert SPARK-8359, SPARK-8800, and SPARK-8677

2015-07-15 Thread yhuai
/31bd30687bc29c0e457c37308d489ae2b6e5b72a (SPARK-8359) * https://github.com/apache/spark/commit/24fda7381171738cbbbacb5965393b660763e562 (SPARK-8677) * https://github.com/apache/spark/commit/4b5cfc988f23988c2334882a255d494fc93d252e (SPARK-8800) Author: Yin Huai yh...@databricks.com Closes #7426 from yhuai/SPARK-9060 and squashes

spark git commit: [SPARK-8972] [SQL] Incorrect result for rollup

2015-07-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ba3309684 - e27212317 [SPARK-8972] [SQL] Incorrect result for rollup We don't support the complex expression keys in the rollup/cube, and we even will not report it if we have the complex group by keys, that will cause very

spark git commit: [SPARK-9102] [SQL] Improve project collapse with nondeterministic expressions

2015-07-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 111c05538 - 3f6d28a5c [SPARK-9102] [SQL] Improve project collapse with nondeterministic expressions Currently we will stop project collapse when the lower projection has nondeterministic expressions. However it's overkill sometimes, we

spark git commit: [SPARK-8638] [SQL] Window Function Performance Improvements

2015-07-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 04c1b49f5 - a9a0d0ceb [SPARK-8638] [SQL] Window Function Performance Improvements ## Description Performance improvements for Spark Window functions. This PR will also serve as the basis for moving away from Hive UDAFs to Spark UDAFs. See

spark git commit: [SPARK-8638] [SQL] Window Function Performance Improvements - Cleanup

2015-07-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a803ac3e0 - 7a8124534 [SPARK-8638] [SQL] Window Function Performance Improvements - Cleanup This PR contains a few clean-ups that are a part of SPARK-8638: a few style issues got fixed, and a few tests were moved. Git commit message is

spark git commit: [SPARK-10144] [UI] Actually show peak execution memory by default

2015-08-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9ce0c7ad3 - 662bb9667 [SPARK-10144] [UI] Actually show peak execution memory by default The peak execution memory metric was introduced in SPARK-8735. That was before Tungsten was enabled by default, so it assumed that

spark git commit: [SPARK-10144] [UI] Actually show peak execution memory by default

2015-08-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 43dcf95e4 - 831f78ee5 [SPARK-10144] [UI] Actually show peak execution memory by default The peak execution memory metric was introduced in SPARK-8735. That was before Tungsten was enabled by default, so it assumed that

spark git commit: [SPARK-11253] [SQL] reset all accumulators in physical operators before execute an action

2015-10-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 87f82a5fb -> 07ced4342 [SPARK-11253] [SQL] reset all accumulators in physical operators before execute an action With this change, our query execution listener can get the metrics correctly. The UI still looks good after this change.

spark git commit: [SPARK-11325] [SQL] Alias 'alias' in Scala's DataFrame API

2015-10-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4bb2b3698 -> d4c397a64 [SPARK-11325] [SQL] Alias 'alias' in Scala's DataFrame API Author: Nong Li Closes #9286 from nongli/spark-11325. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10562] [SQL] support mixed case partitionBy column names for tables stored in metastore

2015-10-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master dc3220ce1 -> a150e6c1b [SPARK-10562] [SQL] support mixed case partitionBy column names for tables stored in metastore https://issues.apache.org/jira/browse/SPARK-10562 Author: Wenchen Fan Closes #9226 from

spark git commit: [SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add option to infer all primitive object types as strings

2015-10-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d4c397a64 -> 82464fb2e [SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add option to infer all primitive object types as strings Currently, when a schema is inferred from a JSON file using sqlContext.read.json, the

spark git commit: [SPARK-11246] [SQL] Table cache for Parquet broken in 1.5

2015-10-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 9e3197aaa -> 76d742386 [SPARK-11246] [SQL] Table cache for Parquet broken in 1.5 The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked

spark git commit: [SPARK-11032] [SQL] correctly handle having

2015-10-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 76d742386 -> bb3b3627a [SPARK-11032] [SQL] correctly handle having We should not stop resolving having when the having condtion is resolved, or something like `count(1)` will crash. Author: Wenchen Fan Closes

spark git commit: [SPARK-11125] [SQL] Uninformative exception when running spark-sql witho…

2015-10-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5e4581250 -> ffed00493 [SPARK-11125] [SQL] Uninformative exception when running spark-sql witho… …ut building with -Phive-thriftserver and SPARK_PREPEND_CLASSES is set This is the exception after this patch. Please help review. ```

[2/2] spark git commit: [SPARK-11347] [SQL] Support for joinWith in Datasets

2015-10-27 Thread yhuai
[SPARK-11347] [SQL] Support for joinWith in Datasets This PR adds a new operation `joinWith` to a `Dataset`, which returns a `Tuple` for each pair where a given `condition` evaluates to true. ```scala case class ClassData(a: String, b: Int) val ds1 = Seq(ClassData("a", 1), ClassData("b",

[1/2] spark git commit: [SPARK-11347] [SQL] Support for joinWith in Datasets

2015-10-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 3bdbbc6c9 -> 5a5f65905 http://git-wip-us.apache.org/repos/asf/spark/blob/5a5f6590/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala -- diff --git

spark git commit: [SPARK-10484] [SQL] Optimize the cartesian join with broadcast join for some cases

2015-10-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b960a8905 -> d9c603989 [SPARK-10484] [SQL] Optimize the cartesian join with broadcast join for some cases In some cases, we can broadcast the smaller relation in cartesian join, which improve the performance significantly. Author: Cheng

spark git commit: [SPARK-11377] [SQL] withNewChildren should not convert StructType to Seq

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f92b7b98e -> 032748bb9 [SPARK-11377] [SQL] withNewChildren should not convert StructType to Seq This is minor, but I ran into while writing Datasets and while it wasn't needed for the final solution, it was super confusing so we should

spark git commit: [SPARK-11292] [SQL] Python API for text data source

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 032748bb9 -> 5aa052191 [SPARK-11292] [SQL] Python API for text data source Adds DataFrameReader.text and DataFrameWriter.text. Author: Reynold Xin Closes #9259 from rxin/SPARK-11292. Project:

spark git commit: [SPARK-11363] [SQL] LeftSemiJoin should be LeftSemi in SparkStrategies

2015-10-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5aa052191 -> 20dfd4674 [SPARK-11363] [SQL] LeftSemiJoin should be LeftSemi in SparkStrategies JIRA: https://issues.apache.org/jira/browse/SPARK-11363 In SparkStrategies some places use LeftSemiJoin. It should be LeftSemi. cc

spark git commit: [SPARK-11274] [SQL] Text data source support for Spark SQL.

2015-10-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 4e38defae -> e1a897b65 [SPARK-11274] [SQL] Text data source support for Spark SQL. This adds API for reading and writing text files, similar to SparkContext.textFile and RDD.saveAsTextFile. ```

spark git commit: [SPARK-11194] [SQL] Use MutableURLClassLoader for the classLoader in IsolatedClientLoader.

2015-10-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e1a897b65 -> 4725cb988 [SPARK-11194] [SQL] Use MutableURLClassLoader for the classLoader in IsolatedClientLoader. https://issues.apache.org/jira/browse/SPARK-11194 Author: Yin Huai <yh...@databricks.com> Closes #9170 from yh

spark git commit: [SPARK-11590][SQL] use native json_tuple in lateral view

2015-11-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 6e2e84f3e -> 5ccc1eb08 [SPARK-11590][SQL] use native json_tuple in lateral view Author: Wenchen Fan Closes #9562 from cloud-fan/json-tuple. (cherry picked from commit 53600854c270d4c953fe95fbae528740b5cf6603)

spark git commit: [SPARK-11590][SQL] use native json_tuple in lateral view

2015-11-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master dfcfcbcc0 -> 53600854c [SPARK-11590][SQL] use native json_tuple in lateral view Author: Wenchen Fan Closes #9562 from cloud-fan/json-tuple. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10371][SQL][FOLLOW-UP] fix code style

2015-11-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 7de8abd6f -> 0d637571d [SPARK-10371][SQL][FOLLOW-UP] fix code style Author: Wenchen Fan Closes #9627 from cloud-fan/follow. (cherry picked from commit 1510c527b4f5ee0953ae42313ef9e16d2f5864c4) Signed-off-by:

spark git commit: [SPARK-10371][SQL][FOLLOW-UP] fix code style

2015-11-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1bc41125e -> 1510c527b [SPARK-10371][SQL][FOLLOW-UP] fix code style Author: Wenchen Fan Closes #9627 from cloud-fan/follow. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up (3)

2015-11-10 Thread yhuai
nts: * Fix for a potential bug in distinct child expression and attribute alignment. * Improved handling of duplicate distinct child expressions. * Added test for distinct UDAF with multiple children. cc yhuai Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #9566 from hvanhovell/S

spark git commit: [SPARK-11451][SQL] Support single distinct count on multiple columns.

2015-11-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 7b3736098 -> 41b2bb1c3 [SPARK-11451][SQL] Support single distinct count on multiple columns. This PR adds support for multiple column in a single count distinct aggregate to the new aggregation path. cc yhuai Author: Herman

spark git commit: [SPARK-11451][SQL] Support single distinct count on multiple columns.

2015-11-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5c4e6d7ec -> 30c8ba71a [SPARK-11451][SQL] Support single distinct count on multiple columns. This PR adds support for multiple column in a single count distinct aggregate to the new aggregation path. cc yhuai Author: Herman van Hov

spark git commit: [SPARK-11453][SQL] append data to partitioned table will messes up the result

2015-11-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 fddf0c413 -> 7eaf48eeb [SPARK-11453][SQL] append data to partitioned table will messes up the result The reason is that: 1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `` partition by

spark git commit: [SPARK-11453][SQL] append data to partitioned table will messes up the result

2015-11-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 97b7080cf -> d8b50f702 [SPARK-11453][SQL] append data to partitioned table will messes up the result The reason is that: 1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `` partition by `a`

spark git commit: [SPARK-11690][PYSPARK] Add pivot to python api

2015-11-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 99693fef0 -> a24477996 [SPARK-11690][PYSPARK] Add pivot to python api This PR adds pivot to the python api of GroupedData with the same syntax as Scala/Java. Author: Andrew Ray Closes #9653 from

spark git commit: [SPARK-11690][PYSPARK] Add pivot to python api

2015-11-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 4a1bcb26d -> 6459a6747 [SPARK-11690][PYSPARK] Add pivot to python api This PR adds pivot to the python api of GroupedData with the same syntax as Scala/Java. Author: Andrew Ray Closes #9653 from

spark git commit: [SPARK-11522][SQL] input_file_name() returns "" for external tables

2015-11-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 a0f9cd77a -> c37ed52ec [SPARK-11522][SQL] input_file_name() returns "" for external tables When computing partition for non-parquet relation, `HadoopRDD.compute` is used. but it does not set the thread local variable `inputFileName`

spark git commit: [SPARK-10181][SQL] Do kerberos login for credentials during hive client initialization

2015-11-16 Thread yhuai
ors, and is not an appropriate solution. This new change does kerberos login during hive client initialization, which will make credentials ready for the particular hive client instance. yhuai Please take a look and let me know. If you are not the right person to talk to, could you point me to some

spark git commit: [SPARK-11191][SPARK-11311][SQL] Backports #9664 and #9277 to branch-1.5

2015-11-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 330961bbf -> b767ceeb2 [SPARK-11191][SPARK-11311][SQL] Backports #9664 and #9277 to branch-1.5 The main purpose of this PR is to backport #9664, which depends on #9277. Author: Cheng Lian Closes #9671 from

spark git commit: [SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite

2015-11-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d22fc1088 -> 64e555110 [SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite The same as #9694, but for Java test suite. yhuai Author: Xiangrui Meng <m...@databricks.com> Closes #9719 from mengxr/SPARK-11672.4.

spark git commit: [SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite

2015-11-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 6f98d47f8 -> 07af78221 [SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite The same as #9694, but for Java test suite. yhuai Author: Xiangrui Meng <m...@databricks.com> Closes #9719 from mengxr/SPAR

spark git commit: [SPARK-10181][SQL] Do kerberos login for credentials during hive client initialization

2015-11-15 Thread yhuai
ors, and is not an appropriate solution. This new change does kerberos login during hive client initialization, which will make credentials ready for the particular hive client instance. yhuai Please take a look and let me know. If you are not the right person to talk to, could you point me to some

spark git commit: [SPARK-10181][SQL] Do kerberos login for credentials during hive client initialization

2015-11-15 Thread yhuai
ors, and is not an appropriate solution. This new change does kerberos login during hive client initialization, which will make credentials ready for the particular hive client instance. yhuai Please take a look and let me know. If you are not the right person to talk to, could you point me to someone responsi

spark git commit: [SPARK-9928][SQL] Removal of LogicalLocalTable

2015-11-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 b56aaa9be -> eced2766b [SPARK-9928][SQL] Removal of LogicalLocalTable LogicalLocalTable in ExistingRDD.scala is replaced by localRelation in LocalRelation.scala? Do you know any reason why we still keep this class? Author:

spark git commit: [SPARK-9928][SQL] Removal of LogicalLocalTable

2015-11-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 835a79d78 -> b58765caa [SPARK-9928][SQL] Removal of LogicalLocalTable LogicalLocalTable in ExistingRDD.scala is replaced by localRelation in LocalRelation.scala? Do you know any reason why we still keep this class? Author: gatorsmile

spark git commit: [SPARK-8992][SQL] Add pivot to dataframe api

2015-11-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 1a21be15f -> b8ff6888e [SPARK-8992][SQL] Add pivot to dataframe api This adds a pivot method to the dataframe api. Following the lead of cube and rollup this adds a Pivot operator that is translated into an Aggregate by the analyzer.

spark git commit: [SPARK-8992][SQL] Add pivot to dataframe api

2015-11-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 4151afbf5 -> 5940fc71d [SPARK-8992][SQL] Add pivot to dataframe api This adds a pivot method to the dataframe api. Following the lead of cube and rollup this adds a Pivot operator that is translated into an Aggregate by the analyzer.

spark git commit: [SPARK-11595][SQL][BRANCH-1.5] Fixes ADD JAR when the input path contains URL scheme

2015-11-12 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 6e823b4d7 -> b478ee374 [SPARK-11595][SQL][BRANCH-1.5] Fixes ADD JAR when the input path contains URL scheme This PR backports #9569 to branch-1.5. Author: Cheng Lian Closes #9570 from

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up (3)

2015-11-10 Thread yhuai
Fix for a potential bug in distinct child expression and attribute alignment. * Improved handling of duplicate distinct child expressions. * Added test for distinct UDAF with multiple children. cc yhuai Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #9566 from hvanhovell/SPARK-9241

spark git commit: [SPARK-9034][SQL] Reflect field names defined in GenericUDTF

2015-11-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9cf56c96b -> c34c27fe9 [SPARK-9034][SQL] Reflect field names defined in GenericUDTF Hive GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect

spark git commit: [SPARK-11469][SQL] Allow users to define nondeterministic udfs.

2015-11-02 Thread yhuai
<yh...@databricks.com> Closes #9393 from yhuai/udfNondeterministic. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9cf56c96 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9cf56c96 Diff: http://git-wip-us.apache.

spark git commit: Revert "[SPARK-11236][CORE] Update Tachyon dependency from 0.7.1 -> 0.8.0."

2015-10-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 45029bfde -> e8ec2a7b0 Revert "[SPARK-11236][CORE] Update Tachyon dependency from 0.7.1 -> 0.8.0." This reverts commit 4f5e60c647d7d6827438721b7fabbc3a57b81023. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-9298][SQL] Add pearson correlation aggregation function

2015-11-01 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f8d93edec -> 3e770a64a [SPARK-9298][SQL] Add pearson correlation aggregation function JIRA: https://issues.apache.org/jira/browse/SPARK-9298 This patch adds pearson correlation aggregation function based on `AggregateExpression2`.

spark git commit: [SPARK-11434][SPARK-11103][SQL] Fix test ": Filter applied on merged Parquet schema with new column fails"

2015-10-30 Thread yhuai
m> Closes #9387 from yhuai/SPARK-11434. (cherry picked from commit 3c471885dc4f86bea95ab542e0d48d22ae748404) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9ac

spark git commit: [SPARK-11504][SQL] API audit for distributeBy and localSort

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master de289bf27 -> abf5e4285 [SPARK-11504][SQL] API audit for distributeBy and localSort 1. Renamed localSort -> sortWithinPartitions to avoid ambiguity in "local" 2. distributeBy -> repartition to match the existing repartition. Author:

spark git commit: [SPARK-11510][SQL] Remove SQL aggregation tests for higher order statistics

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 411ff6afb -> b6e0a5ae6 [SPARK-11510][SQL] Remove SQL aggregation tests for higher order statistics We have some aggregate function tests in both DataFrameAggregateSuite and SQLQuerySuite. The two have almost the same coverage and we

spark git commit: [SPARK-10978][SQL] Allow data sources to eliminate filters

2015-11-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b2e4b314d -> ebf8b0b48 [SPARK-10978][SQL] Allow data sources to eliminate filters This PR adds a new method `unhandledFilters` to `BaseRelation`. Data sources which implement this method properly may avoid the overhead of defensive

spark git commit: [SPARK-10648][SQL][BRANCH-1.4] Oracle dialect to handle nonspecific numeric types

2015-11-05 Thread yhuai
<yh...@databricks.com> Closes #9498 from yhuai/OracleDialect-1.4. (cherry picked from commit 6c5e9a3a056cc8ee660a2b22a0a5ff17d674b68d) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up

2015-11-07 Thread yhuai
ant if expression in the non-distinct aggregation path and adds a multiple distinct test to the AggregationQuerySuite. cc yhuai marmbrus Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #9541 from hvanhovell/SPARK-9241-followup. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up

2015-11-07 Thread yhuai
ant if expression in the non-distinct aggregation path and adds a multiple distinct test to the AggregationQuerySuite. cc yhuai marmbrus Author: Herman van Hovell <hvanhov...@questtec.nl> Closes #9541 from hvanhovell/SPARK-9241-followup. (cherry picked fr

spark git commit: [SPARK-11329] [SQL] Cleanup from spark-11329 fix.

2015-11-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d648a4ad5 -> e352de0db [SPARK-11329] [SQL] Cleanup from spark-11329 fix. Author: Nong Closes #9442 from nongli/spark-11483. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-11490][SQL] variance should alias var_samp instead of var_pop.

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e0fc9c7e5 -> 3bd6f5d2a [SPARK-11490][SQL] variance should alias var_samp instead of var_pop. stddev is an alias for stddev_samp. variance should be consistent with stddev. Also took the chance to remove internal Stddev and Variance, and

spark git commit: [SPARK-11455][SQL] fix case sensitivity of partition by

2015-11-03 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e352de0db -> 2692bdb7d [SPARK-11455][SQL] fix case sensitivity of partition by depend on `caseSensitive` to do column name equality check, instead of just `==` Author: Wenchen Fan Closes #9410 from

spark git commit: [SPARK-10304][SQL] Following up checking valid dir structure for partition discovery

2015-11-04 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 987df4bfc -> de289bf27 [SPARK-10304][SQL] Following up checking valid dir structure for partition discovery This patch follows up #8840. Author: Liang-Chi Hsieh Closes #9459 from

spark git commit: [SPARK-11371] Make "mean" an alias for "avg" operator

2015-11-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 33ae7a35d -> db11ee5e5 [SPARK-11371] Make "mean" an alias for "avg" operator >From Reynold in the thread 'Exception when using some aggregate operators' >(http://search-hadoop.com/m/q3RTt0xFr22nXB4/): I don't think these are bugs. The

spark git commit: [SPARK-11329][SQL] Support star expansion for structs.

2015-11-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2cef1bb0b -> 9cb5c731d [SPARK-11329][SQL] Support star expansion for structs. 1. Supporting expanding structs in Projections. i.e. "SELECT s.*" where s is a struct type. This is fixed by allowing the expand function to handle structs

<    1   2   3   4   5   6   7   8   >