from:"\"lixiao\""

spark git commit: [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesceHints

2018-08-06 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 64ad7b841 -> d063e3a47 [SPARK-24940][SQL] Use IntegerLiteral in ResolveCoalesceHints ## What changes were proposed in this pull request? Follow up to fix an unmerged review comment. ## How was this patch tested? Unit test ResolveHintsSui

spark git commit: [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries

2018-08-03 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 41c2227a2 -> 36ea55e97 [SPARK-24940][SQL] Coalesce and Repartition Hint for SQL Queries ## What changes were proposed in this pull request? Many Spark SQL users in my company have asked for a way to control the number of output files in S

spark git commit: [SPARK-24997][SQL] Enable support of MINUS ALL

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b0d6967d4 -> 19a453191 [SPARK-24997][SQL] Enable support of MINUS ALL ## What changes were proposed in this pull request? Enable support for MINUS ALL which was gated at AstBuilder. ## How was this patch tested? Added tests in SQLQueryTest

spark git commit: [SPARK-24788][SQL] RelationalGroupedDataset.toString with unresolved exprs should not fail

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master f45d60a5a -> b0d6967d4 [SPARK-24788][SQL] RelationalGroupedDataset.toString with unresolved exprs should not fail ## What changes were proposed in this pull request? In the current master, `toString` throws an exception when `RelationalGr

spark git commit: [SPARK-24966][SQL] Implement precedence rules for set operations.

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b3f2911ee -> 73dd6cf9b [SPARK-24966][SQL] Implement precedence rules for set operations. ## What changes were proposed in this pull request? Currently the set operations INTERSECT, UNION and EXCEPT are assigned the same precedence. This P

spark git commit: [SPARK-24705][SQL] ExchangeCoordinator broken when duplicate exchanges reused

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 02f967795 -> efef55388 [SPARK-24705][SQL] ExchangeCoordinator broken when duplicate exchanges reused ## What changes were proposed in this pull request? In the current master, `EnsureRequirements` sets the number of exchanges in `ExchangeC

spark git commit: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 0df6bf882 -> 02f967795 [SPARK-23908][SQL] Add transform function. ## What changes were proposed in this pull request? This pr adds `transform` function which transforms elements in an array using the function. Optionally we can take the i

spark git commit: [SPARK-24598][DOCS] State in the documentation the behavior when arithmetic operations cause overflow

2018-08-02 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 15fc23722 -> ad2e63662 [SPARK-24598][DOCS] State in the documentation the behavior when arithmetic operations cause overflow ## What changes were proposed in this pull request? According to the discussion in https://github.com/apache/spar

spark git commit: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread lixiao

Repository: spark Updated Branches: refs/heads/master c9914cf04 -> 166f34618 [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE ## What changes were proposed in this pull request? This PR is to refactor the code in AVERAGE by dsl. ## How was this patch tested? N/A Author: Xiao Li Clo

spark git commit: [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal followed by aggregation returns wrong result

2018-08-01 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.2 c4b37696f -> 22ce8051f [SPARK-24957][SQL][BACKPORT-2.2] Average with decimal followed by aggregation returns wrong result ## What changes were proposed in this pull request? When we do an average, the result is computed dividing the s

spark git commit: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema

2018-08-01 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 9f558601e -> ce084d3e0 [SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema ## What changes were proposed in this pull request? Regarding user-specified schema, data sources may have 3 different behaviors: 1. must have a user-spe

spark git commit: [SPARK-24937][SQL] Datasource partition table should load empty static partitions

2018-08-01 Thread lixiao

Repository: spark Updated Branches: refs/heads/master f5113ea8d -> 9f558601e [SPARK-24937][SQL] Datasource partition table should load empty static partitions ## What changes were proposed in this pull request? How to reproduce: ```sql spark-sql> CREATE TABLE tbl AS SELECT 1; spark-sql> CREA

spark git commit: [SPARK-24982][SQL] UDAF resolution should not throw AssertionError

2018-08-01 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 1f7e22c72 -> 1efffb799 [SPARK-24982][SQL] UDAF resolution should not throw AssertionError ## What changes were proposed in this pull request? When user calls anUDAF with the wrong number of arguments, Spark previously throws an AssertionEr

spark git commit: [SPARK-24951][SQL] Table valued functions should throw AnalysisException

2018-07-31 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 5f3441e54 -> 1f7e22c72 [SPARK-24951][SQL] Table valued functions should throw AnalysisException ## What changes were proposed in this pull request? Previously TVF resolution could throw IllegalArgumentException if the data type is null typ

spark git commit: [SPARK-24536] Validate that an evaluated limit clause cannot be null

2018-07-31 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 25ea27b09 -> fc3df4517 [SPARK-24536] Validate that an evaluated limit clause cannot be null It proposes a version in which nullable expressions are not valid in the limit clause It was tested with unit and e2e tests. Please review ht

spark git commit: [SPARK-24536] Validate that an evaluated limit clause cannot be null

2018-07-31 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b4fd75fb9 -> 4ac2126bc [SPARK-24536] Validate that an evaluated limit clause cannot be null ## What changes were proposed in this pull request? It proposes a version in which nullable expressions are not valid in the limit clause ## How

spark git commit: [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types

2018-07-30 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 8141d5592 -> b4fd75fb9 [SPARK-24972][SQL] PivotFirst could not handle pivot columns of complex types ## What changes were proposed in this pull request? When the pivot column is of a complex type, the eval() result will be an UnsafeRow, w

spark git commit: [SPARK-24865][SQL] Remove AnalysisBarrier addendum

2018-07-30 Thread lixiao

Repository: spark Updated Branches: refs/heads/master d6b7545b5 -> abbb4ab4d [SPARK-24865][SQL] Remove AnalysisBarrier addendum ## What changes were proposed in this pull request? I didn't want to pollute the diff in the previous PR and left some TODOs. This is a follow-up to address those TO

spark git commit: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column

2018-07-30 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b90bfe3c4 -> 47d84e4d0 [SPARK-22814][SQL] Support Date/Timestamp in a JDBC partition column ## What changes were proposed in this pull request? This pr supported Date/Timestamp in a JDBC partition column (a numeric column is only supported

spark git commit: [SPARK-24771][BUILD] Upgrade Apache AVRO to 1.8.2

2018-07-30 Thread lixiao

Repository: spark Updated Branches: refs/heads/master fca0b8528 -> b90bfe3c4 [SPARK-24771][BUILD] Upgrade Apache AVRO to 1.8.2 ## What changes were proposed in this pull request? Upgrade Apache Avro from 1.7.7 to 1.8.2. The major new features: 1. More logical types. From the spec of 1.8.2 h

spark git commit: [SPARK-21274][SQL] Implement INTERSECT ALL clause

2018-07-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 6690924c4 -> 65a4bc143 [SPARK-21274][SQL] Implement INTERSECT ALL clause ## What changes were proposed in this pull request? Implements INTERSECT ALL clause through query rewrites using existing operators in Spark. Please refer to [Link]

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.1 7d50fec3f -> a3eb07db3 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch w

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 d5f340f27 -> 71eb7d468 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch w

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.2 73764737d -> f52d0c451 [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch w

spark git commit: [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error

2018-07-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 8fe5d2c39 -> 2c54aae1b [SPARK-24809][SQL] Serializing LongToUnsafeRowMap in executor may result in data error When join key is long or int in broadcast join, Spark will use `LongToUnsafeRowMap` to store key-values of the table witch will

spark git commit: [MINOR] Update docs for functions.scala to make it clear not all the built-in functions are defined there

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 34ebcc6b5 -> 6424b146c [MINOR] Update docs for functions.scala to make it clear not all the built-in functions are defined there The title summarizes the change. Author: Reynold Xin Closes #21318 from rxin/functions. Project: http://g

spark git commit: [MINOR] Improve documentation for HiveStringType's

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 10f1f1965 -> 34ebcc6b5 [MINOR] Improve documentation for HiveStringType's The diff should be self-explanatory. Author: Reynold Xin Closes #21897 from rxin/hivestringtypedoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Co

spark git commit: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 5828f41a5 -> 10f1f1965 [SPARK-21274][SQL] Implement EXCEPT ALL clause. ## What changes were proposed in this pull request? Implements EXCEPT ALL clause through query rewrites using existing operators in Spark. In this PR, an internal UDTF

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ef6c8395c -> c9bec1d37 [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]: https://issues.apache.org/jira/br

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.2 f339e2fd7 -> 73764737d [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]: https://issues.apache.org/jir

spark git commit: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided"

2018-07-27 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 fa552c3c1 -> d5f340f27 [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-java cannot be "provided" ## What changes were proposed in this pull request? Please see [SPARK-24927][1] for more details. [1]: https://issues.apache.org/jir

spark git commit: [SPARK-24288][SQL] Add a JDBC Option to enable preventing predicate pushdown

2018-07-26 Thread lixiao

Repository: spark Updated Branches: refs/heads/master e6e9031d7 -> 21fcac164 [SPARK-24288][SQL] Add a JDBC Option to enable preventing predicate pushdown ## What changes were proposed in this pull request? Add a JDBC Option "pushDownPredicate" (default `true`) to allow/disallow predicate pus

spark git commit: [SPARK-24919][BUILD] New linter rule for sparkContext.hadoopConfiguration

2018-07-26 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 2c8274568 -> fa09d9192 [SPARK-24919][BUILD] New linter rule for sparkContext.hadoopConfiguration ## What changes were proposed in this pull request? In most cases, we should use `spark.sessionState.newHadoopConf()` instead of `sparkContex

spark git commit: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-26 Thread lixiao

Repository: spark Updated Branches: refs/heads/master e3486e1b9 -> 2c8274568 [SPARK-24307][CORE] Add conf to revert to old code. In case there are any issues in converting FileSegmentManagedBuffer to ChunkedByteBuffer, add a conf to go back to old code path. Followup to 7e847646d1f377f46dc315

spark git commit: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-26 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 5ed7660d1 -> e3486e1b9 [SPARK-24795][CORE] Implement barrier execution mode ## What changes were proposed in this pull request? Propose new APIs and modify job/task scheduling to support barrier execution mode, which requires all tasks in

spark git commit: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule Exclusion

2018-07-26 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 58353d7f4 -> 5ed7660d1 [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule Exclusion ## What changes were proposed in this pull request? This is an extension to the original PR, in which rule exclusion did not work for cl

spark git commit: [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 740606eb8 -> fa552c3c1 [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter ```Scala val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a") .withColumn("b", udf1($"a", udf1($"a", lit(10

spark git commit: [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter

2018-07-25 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 17f469bc8 -> d2e7deb59 [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter ## What changes were proposed in this pull request? ```Scala val udf1 = udf({(x: Int, y: Int) => x + y}) val df = spark.range(0, 3).toDF("a")

spark git commit: [SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame

2018-07-25 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 0c83f718e -> 17f469bc8 [SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame ## What changes were proposed in this pull request? Besides spark setting spark.sql.sources.partitionOverwriteMode

spark git commit: [SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string

2018-07-25 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 571a6f057 -> 2f77616e1 [SPARK-24849][SPARK-24911][SQL] Converting a value of StructType to a DDL string ## What changes were proposed in this pull request? In the PR, I propose to extend the `StructType`/`StructField` classes by new metho

[2/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d86cffd1/site/news/spark-2-2-1-released.html -- diff --git a/site/news/spark-2-2-1-released.html b/site/news/spark-2-2-1-released.html index df7c2f0..b9d465f 100644 --- a/s

[1/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao

Repository: spark-website Updated Branches: refs/heads/asf-site f5d7dfafe -> d86cffd19 http://git-wip-us.apache.org/repos/asf/spark-website/blob/d86cffd1/site/releases/spark-release-1-1-1.html -- diff --git a/site/releases/spar

[3/3] spark-website git commit: spark summit eu 2018

2018-07-25 Thread lixiao

spark summit eu 2018 Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/d86cffd1 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/d86cffd1 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/

spark git commit: [SPARK-24768][FOLLOWUP][SQL] Avro migration followup: change artifactId to spark-avro

2018-07-25 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 7a5fd4a91 -> c44eb561e [SPARK-24768][FOLLOWUP][SQL] Avro migration followup: change artifactId to spark-avro ## What changes were proposed in this pull request? After rethinking on the artifactId, I think it should be `spark-avro` instead

spark git commit: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatched message

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 78e0a725e -> 7a5fd4a91 [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatched message ## What changes were proposed in this pull request? Improvement `IN` predicate type mismatched message: ```sql Mismatched columns: [(, t, 4, ., `, t, 4

spark git commit: [SPARK-23957][SQL] Sorts in subqueries are redundant and can be removed

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master d4c341589 -> afb062753 [SPARK-23957][SQL] Sorts in subqueries are redundant and can be removed ## What changes were proposed in this pull request? Thanks to henryr for the original idea at https://github.com/apache/spark/pull/21049 Descri

spark git commit: [SPARK-24890][SQL] Short circuiting the `if` condition when `trueValue` and `falseValue` are the same

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master c26b09216 -> d4c341589 [SPARK-24890][SQL] Short circuiting the `if` condition when `trueValue` and `falseValue` are the same ## What changes were proposed in this pull request? When `trueValue` and `falseValue` are semantic equivalence, t

spark git commit: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 740a23d7d -> 6a5999286 [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a difference between the same plan being analyzed once an

spark git commit: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 15fff7903 -> c26b09216 [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule ## What changes were proposed in this pull request? The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a differe

spark git commit: [SPARK-24812][SQL] Last Access Time in the table description is not valid

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 9d27541a8 -> d4a277f0c [SPARK-24812][SQL] Last Access Time in the table description is not valid ## What changes were proposed in this pull request? Last Access Time will always displayed wrong date Thu Jan 01 05:30:00 IST 1970 when user

spark git commit: [SPARK-23325] Use InternalRow when reading with DataSourceV2.

2018-07-24 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 3d5c61e5f -> 9d27541a8 [SPARK-23325] Use InternalRow when reading with DataSourceV2. ## What changes were proposed in this pull request? This updates the DataSourceV2 API to use InternalRow instead of Row for the default case with no scan

spark git commit: [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL

2018-07-23 Thread lixiao

Repository: spark Updated Branches: refs/heads/master d2436a852 -> 13a67b070 [SPARK-24870][SQL] Cache can't work normally if there are case letters in SQL ## What changes were proposed in this pull request? Modified the canonicalized to not case-insensitive. Before the PR, cache can't work nor

spark git commit: [SPARK-24339][SQL] Prunes the unused columns from child of ScriptTransformation

2018-07-23 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 61f0ca4f1 -> cfc3e1aaa [SPARK-24339][SQL] Prunes the unused columns from child of ScriptTransformation ## What changes were proposed in this pull request? Modify the strategy in ColumnPruning to add a Project between ScriptTransformation

spark git commit: [SPARK-24850][SQL] fix str representation of CachedRDDBuilder

2018-07-23 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 08e315f63 -> 2edf17eff [SPARK-24850][SQL] fix str representation of CachedRDDBuilder ## What changes were proposed in this pull request? As of https://github.com/apache/spark/pull/21018, InMemoryRelation includes its cacheBuilder when logg

spark git commit: [SPARK-24887][SQL] Avro: use SerializableConfiguration in Spark utils to deduplicate code

2018-07-23 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 434319e73 -> 08e315f63 [SPARK-24887][SQL] Avro: use SerializableConfiguration in Spark utils to deduplicate code ## What changes were proposed in this pull request? To implement the method `buildReader` in `FileFormat`, it is required to

spark git commit: [SPARK-24802][SQL] Add a new config for Optimization Rule Exclusion

2018-07-23 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ab18b02e6 -> 434319e73 [SPARK-24802][SQL] Add a new config for Optimization Rule Exclusion ## What changes were proposed in this pull request? Since Spark has provided fairly clear interfaces for adding user-defined optimization rules, it

spark git commit: [SPARK-24811][SQL] Avro: add new function from_avro and to_avro

2018-07-22 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 81af88687 -> 8817c68f5 [SPARK-24811][SQL] Avro: add new function from_avro and to_avro ## What changes were proposed in this pull request? 1. Add a new function from_avro for parsing a binary column of avro format and converting it into i

spark git commit: [SPARK-24836][SQL] New option for Avro datasource - ignoreExtension

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master bbd6f0c25 -> 106880edc [SPARK-24836][SQL] New option for Avro datasource - ignoreExtension ## What changes were proposed in this pull request? I propose to add new option for AVRO datasource which should control ignoring of files without

spark git commit: [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 db1f3cc76 -> bd6bfacb2 [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown ## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`. Th

spark git commit: [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 96f312076 -> bbd6f0c25 [SPARK-24879][SQL] Fix NPE in Hive partition pruning filter pushdown ## What changes were proposed in this pull request? We get a NPE when we have a filter on a partition column of the form `col in (x, null)`. This i

spark git commit: [PYSPARK][TEST][MINOR] Fix UDFInitializationTests

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 597bdeff2 -> 96f312076 [PYSPARK][TEST][MINOR] Fix UDFInitializationTests ## What changes were proposed in this pull request? Fix a typo in pyspark sql tests Author: William Sheu Closes #21833 from PenguinToast/fix-test-typo. Project:

spark git commit: [SPARK-24880][BUILD] Fix the group id for spark-kubernetes-integration-tests

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 00b864aa7 -> f765bb782 [SPARK-24880][BUILD] Fix the group id for spark-kubernetes-integration-tests ## What changes were proposed in this pull request? The correct group id should be `org.apache.spark`. This is causing the nightly build f

spark git commit: [SPARK-24876][SQL] Avro: simplify schema serialization

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 2333a34d3 -> 00b864aa7 [SPARK-24876][SQL] Avro: simplify schema serialization ## What changes were proposed in this pull request? Previously in the refactoring of Avro Serializer and Deserializer, a new class SerializableSchema is created

spark git commit: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 9ad77b303 -> 2333a34d3 [SPARK-22880][SQL] Add cascadeTruncate option to JDBC datasource This commit adds the `cascadeTruncate` option to the JDBC datasource API, for databases that support this functionality (PostgreSQL and Oracle at the mo

spark git commit: Revert "[SPARK-24811][SQL] Avro: add new function from_avro and to_avro"

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 3cb1b5780 -> 9ad77b303 Revert "[SPARK-24811][SQL] Avro: add new function from_avro and to_avro" This reverts commit 244bcff19463d82ec72baf15bc0a5209f21f2ef3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wi

spark git commit: [SPARK-24811][SQL] Avro: add new function from_avro and to_avro

2018-07-20 Thread lixiao

Repository: spark Updated Branches: refs/heads/master cc4d64bb1 -> 244bcff19 [SPARK-24811][SQL] Avro: add new function from_avro and to_avro ## What changes were proposed in this pull request? Add a new function from_avro for parsing a binary column of avro format and converting it into its

spark git commit: [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for GROUPING SET

2018-07-19 Thread lixiao

Repository: spark Updated Branches: refs/heads/master a5925c163 -> 2b91d9918 [SPARK-24424][SQL] Support ANSI-SQL compliant syntax for GROUPING SET ## What changes were proposed in this pull request? Enhances the parser and analyzer to support ANSI compliant syntax for GROUPING SET. As part o

[2/2] spark git commit: [SPARK-24268][SQL] Use datatype.catalogString in error messages

2018-07-19 Thread lixiao

[SPARK-24268][SQL] Use datatype.catalogString in error messages ## What changes were proposed in this pull request? As stated in https://github.com/apache/spark/pull/21321, in the error messages we should use `catalogString`. This is not the case, as SPARK-22893 used `simpleString` in order to

[1/2] spark git commit: [SPARK-24268][SQL] Use datatype.catalogString in error messages

2018-07-19 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 1462b1766 -> a5925c163 http://git-wip-us.apache.org/repos/asf/spark/blob/a5925c16/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java -

spark git commit: [SPARK-24163][SPARK-24164][SQL] Support column list as the pivot column in Pivot

2018-07-18 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 1272b2034 -> cd203e0df [SPARK-24163][SPARK-24164][SQL] Support column list as the pivot column in Pivot ## What changes were proposed in this pull request? 1. Extend the Parser to enable parsing a column list as the pivot column. 2. Extend

spark git commit: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-17 Thread lixiao

Repository: spark Updated Branches: refs/heads/master fc2e18963 -> 3b59d326c [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2 ## What changes were proposed in this pull request? This issue aims to upgrade Apache ORC library from 1.4.4 to 1.5.2 in order to bring the following benefits into Ap

spark git commit: [SPARK-24681][SQL] Verify nested column names in Hive metastore

2018-07-17 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 912634b00 -> 2a4dd6f06 [SPARK-24681][SQL] Verify nested column names in Hive metastore ## What changes were proposed in this pull request? This pr added code to check if nested column names do not include ',', ':', and ';' because Hive met

spark git commit: [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty

2018-07-16 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ba437fc5c -> 0f0d1865f [SPARK-24402][SQL] Optimize `In` expression when only one element in the collection or collection is empty ## What changes were proposed in this pull request? Two new rules in the logical plan optimizers are added.

spark git commit: [SPARK-24805][SQL] Do not ignore avro files without extensions by default

2018-07-16 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b0c95a1d6 -> ba437fc5c [SPARK-24805][SQL] Do not ignore avro files without extensions by default ## What changes were proposed in this pull request? In the PR, I propose to change default behaviour of AVRO datasource which currently ignor

spark git commit: [SPARK-23901][SQL] Removing masking functions

2018-07-16 Thread lixiao

Repository: spark Updated Branches: refs/heads/master b045315e5 -> b0c95a1d6 [SPARK-23901][SQL] Removing masking functions The PR reverts #21246. Author: Marek Novotny Closes #21786 from mn-mikke/SPARK-23901. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wi

spark git commit: [SPARK-24810][SQL] Fix paths to test files in AvroSuite

2018-07-15 Thread lixiao

Repository: spark Updated Branches: refs/heads/master d463533de -> 9f929458f [SPARK-24810][SQL] Fix paths to test files in AvroSuite ## What changes were proposed in this pull request? In the PR, I propose to move `testFile()` to the common trait `SQLTestUtilsBase` and wrap test files in `Av

spark git commit: [SPARK-24676][SQL] Project required data from CSV parsed data when column pruning disabled

2018-07-15 Thread lixiao

Repository: spark Updated Branches: refs/heads/master bcf7121ed -> d463533de [SPARK-24676][SQL] Project required data from CSV parsed data when column pruning disabled ## What changes were proposed in this pull request? This pr modified code to project required data from CSV parsed data when

spark git commit: [SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note

2018-07-14 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 3e7dc8296 -> 69993217f [SPARK-24807][CORE] Adding files/jars twice: output a warning and add a note ## What changes were proposed in this pull request? In the PR, I propose to output an warning if the `addFile()` or `addJar()` methods are

spark git commit: [SPARK-24776][SQL] Avro unit test: deduplicate code and replace deprecated methods

2018-07-14 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 43e4e851b -> 3e7dc8296 [SPARK-24776][SQL] Avro unit test: deduplicate code and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest/SharedSQLContext/SQLTestUtils, instead

spark git commit: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader

2018-07-13 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 3b6005b8a -> a75571b46 [SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader ## What changes were proposed in this pull request? Add `org.apache.derby` to `IsolatedClientLoader`, otherwise it may throw an exception: ```scala ...

spark git commit: Revert "[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods"

2018-07-13 Thread lixiao

Repository: spark Updated Branches: refs/heads/master c1b62e420 -> 3bcb1b481 Revert "[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods" This reverts commit c1b62e420a43aa7da36733ccdbec057d87ac1b43. Project: http://git-wip-us.apache.org/repos/asf/spark/repo C

spark git commit: [SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods

2018-07-13 Thread lixiao

Repository: spark Updated Branches: refs/heads/master dfd7ac988 -> c1b62e420 [SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest/SharedSQLContext/SQLTestUtils, instead o

spark git commit: [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work

2018-07-13 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 32429256f -> 9cf375f5b [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work ## What changes were proposed in this pull request? When we use a reference from Dataset in filter or sort, which was not used in t

spark git commit: [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work

2018-07-13 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 0f24c6f8a -> dfd7ac988 [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not work ## What changes were proposed in this pull request? When we use a reference from Dataset in filter or sort, which was not used in the p

spark git commit: [SPARK-23486] cache the function name from the external catalog for lookupFunctions

2018-07-12 Thread lixiao

Repository: spark Updated Branches: refs/heads/master e0f4f206b -> 0ce11d0e3 [SPARK-23486] cache the function name from the external catalog for lookupFunctions ## What changes were proposed in this pull request? This PR will cache the function name from external catalog, it is used by look

spark git commit: [SPARK-24790][SQL] Allow complex aggregate expressions in Pivot

2018-07-12 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 11384893b -> 75725057b [SPARK-24790][SQL] Allow complex aggregate expressions in Pivot ## What changes were proposed in this pull request? Relax the check to allow complex aggregate expressions, like `ceil(sum(col1))` or `sum(col1) + 1`,

spark git commit: [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper locations

2018-07-12 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 07704c971 -> 11384893b [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper locations ## What changes were proposed in this pull request? The PR is a followup to move the test cases introduced by the original PR in their proper location

spark git commit: [SPARK-23007][SQL][TEST] Add read schema suite for file-based data sources

2018-07-12 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 395860a98 -> 07704c971 [SPARK-23007][SQL][TEST] Add read schema suite for file-based data sources ## What changes were proposed in this pull request? The reader schema is said to be evolved (or projected) when it changed after the data is

[2/2] spark git commit: [SPARK-24768][SQL] Have a built-in AVRO data source implementation

2018-07-12 Thread lixiao

[SPARK-24768][SQL] Have a built-in AVRO data source implementation ## What changes were proposed in this pull request? Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines. U

[1/2] spark git commit: [SPARK-24768][SQL] Have a built-in AVRO data source implementation

2018-07-12 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 1055c94cd -> 395860a98 http://git-wip-us.apache.org/repos/asf/spark/blob/395860a9/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala -- diff --git a/ext

spark git commit: [SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig

2018-07-11 Thread lixiao

Repository: spark Updated Branches: refs/heads/master e008ad175 -> 3ab48f985 [SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig ## What changes were proposed in this pull request? In the PR, I propose to extend `RuntimeConfig` by new method `isModifiable()` which returns `true` if

spark git commit: [SPARK-24782][SQL] Simplify conf retrieval in SQL expressions

2018-07-11 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ff7f6ef75 -> e008ad175 [SPARK-24782][SQL] Simplify conf retrieval in SQL expressions ## What changes were proposed in this pull request? The PR simplifies the retrieval of config in `size`, as we can access them from tasks too thanks to S

spark git commit: [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas

2018-07-11 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 86457a16d -> 32429256f [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas A self-join on a dataset which contains a `FlatMapGroupsInPandas` fails because of duplicate attributes. This happens because we are not de

spark git commit: [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas

2018-07-11 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 592cc8458 -> ebf4bfb96 [SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas ## What changes were proposed in this pull request? A self-join on a dataset which contains a `FlatMapGroupsInPandas` fails because of duplica

spark git commit: [SPARK-24759][SQL] No reordering keys for broadcast hash join

2018-07-09 Thread lixiao

Repository: spark Updated Branches: refs/heads/master aec966b05 -> eb6e98803 [SPARK-24759][SQL] No reordering keys for broadcast hash join ## What changes were proposed in this pull request? As the implementation of the broadcast hash join is independent of the input hash partitioning, reord

spark git commit: Revert "[SPARK-24268][SQL] Use datatype.simpleString in error messages"

2018-07-09 Thread lixiao

Repository: spark Updated Branches: refs/heads/master 1bd3d61f4 -> aec966b05 Revert "[SPARK-24268][SQL] Use datatype.simpleString in error messages" This reverts commit 1bd3d61f4191767a94b71b42f4d00706b703e84f. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

spark git commit: [SPARK-24675][SQL] Rename table: validate existence of new location

2018-07-05 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ac78bcce0 -> 33952cfa8 [SPARK-24675][SQL] Rename table: validate existence of new location ## What changes were proposed in this pull request? If table is renamed to a existing new location, data won't show up. ``` scala> Seq("hello").toDF

spark git commit: [SPARK-22384][SQL][FOLLOWUP] Refine partition pruning when attribute is wrapped in Cast

2018-07-04 Thread lixiao

Repository: spark Updated Branches: refs/heads/master ca8243f30 -> bf764a33b [SPARK-22384][SQL][FOLLOWUP] Refine partition pruning when attribute is wrapped in Cast ## What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/21586 , `Cast.mayTrunc

spark git commit: [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project

2018-06-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 8ff4b9727 -> 3c0af793f [SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project The ColumnPruning rule tries adding an extra Project if an input node produces fields more than needed, but as a post-processing step, it needs

spark git commit: simplify rand in dsl/package.scala

2018-06-29 Thread lixiao

Repository: spark Updated Branches: refs/heads/branch-2.3 0f534d3da -> 8ff4b9727 simplify rand in dsl/package.scala (cherry picked from commit d54d8b86301581142293341af25fd78b3278a2e8) Signed-off-by: Xiao Li Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-u

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1649 matches

Mail list logo