[spark] branch master updated (ddf4a50 -> abe370f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ddf4a50 [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column add abe370f [SPARK-27322][SQL] DataSourceV2 table relation No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 +- .../spark/sql/catalog/v2/utils/CatalogV2Util.scala | 13 - .../spark/sql/util/CaseInsensitiveStringMap.java | 18 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 48 +++-- .../sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../spark/sql/catalyst/analysis/ResolveHints.scala | 12 +++-- .../spark/sql/catalyst/analysis/unresolved.scala | 13 +++-- .../apache/spark/sql/catalyst/dsl/package.scala| 5 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +- .../datasources/v2/DataSourceV2Implicits.scala | 0 .../datasources/v2/DataSourceV2Relation.scala | 2 + .../sql/catalyst/parser/PlanParserSuite.scala | 31 --- .../scala/org/apache/spark/sql/SparkSession.scala | 6 ++- .../apache/spark/sql/execution/command/views.scala | 9 ++-- .../spark/sql/execution/datasources/rules.scala| 8 +-- .../sql/internal/BaseSessionStateBuilder.scala | 3 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 2 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 2 +- .../sql/sources/v2/DataSourceV2SQLSuite.scala | 62 ++ .../spark/sql/util/DataFrameCallbackSuite.scala| 2 +- .../spark/sql/hive/HiveSessionStateBuilder.scala | 3 ++ .../org/apache/spark/sql/hive/HiveStrategies.scala | 5 +- .../org/apache/spark/sql/hive/InsertSuite.scala| 12 + .../org/apache/spark/sql/hive/test/TestHive.scala | 2 +- 24 files changed, 213 insertions(+), 55 deletions(-) rename sql/{core => catalyst}/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala (100%) rename sql/{core => catalyst}/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala (97%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 29a39e8 [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column 29a39e8 is described below commit 29a39e8e58d99762594f3cf6854810cfb529a251 Author: Liang-Chi Hsieh AuthorDate: Thu Jun 13 11:04:41 2019 +0900 [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column ## What changes were proposed in this pull request? Just found the doctest on `over` function of `Column` is commented out. The window spec is also not for the window function used there. We should either remove the doctest, or improve it. Because other functions of `Column` have doctest generally, so this PR tries to improve it. ## How was this patch tested? Added doctest. Closes #24854 from viirya/column-test-minor. Authored-by: Liang-Chi Hsieh Signed-off-by: HyukjinKwon (cherry picked from commit ddf4a5031287c0c26ea462dd89ea99d769473213) Signed-off-by: HyukjinKwon --- python/pyspark/sql/column.py | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index e7dec11..7f12d23 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -668,9 +668,17 @@ class Column(object): :return: a Column >>> from pyspark.sql import Window ->>> window = Window.partitionBy("name").orderBy("age").rowsBetween(-1, 1) +>>> window = Window.partitionBy("name").orderBy("age") \ +.rowsBetween(Window.unboundedPreceding, Window.currentRow) >>> from pyspark.sql.functions import rank, min ->>> # df.select(rank().over(window), min('age').over(window)) +>>> df.withColumn("rank", rank().over(window)) \ +.withColumn("min", min('age').over(window)).show() ++---+-++---+ +|age| name|rank|min| ++---+-++---+ +| 5| Bob| 1| 5| +| 2|Alice| 1| 2| ++---+-++---+ """ from pyspark.sql.window import WindowSpec if not isinstance(window, WindowSpec): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ddf4a50 [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column ddf4a50 is described below commit ddf4a5031287c0c26ea462dd89ea99d769473213 Author: Liang-Chi Hsieh AuthorDate: Thu Jun 13 11:04:41 2019 +0900 [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column ## What changes were proposed in this pull request? Just found the doctest on `over` function of `Column` is commented out. The window spec is also not for the window function used there. We should either remove the doctest, or improve it. Because other functions of `Column` have doctest generally, so this PR tries to improve it. ## How was this patch tested? Added doctest. Closes #24854 from viirya/column-test-minor. Authored-by: Liang-Chi Hsieh Signed-off-by: HyukjinKwon --- python/pyspark/sql/column.py | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index e7dec11..7f12d23 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -668,9 +668,17 @@ class Column(object): :return: a Column >>> from pyspark.sql import Window ->>> window = Window.partitionBy("name").orderBy("age").rowsBetween(-1, 1) +>>> window = Window.partitionBy("name").orderBy("age") \ +.rowsBetween(Window.unboundedPreceding, Window.currentRow) >>> from pyspark.sql.functions import rank, min ->>> # df.select(rank().over(window), min('age').over(window)) +>>> df.withColumn("rank", rank().over(window)) \ +.withColumn("min", min('age').over(window)).show() ++---+-++---+ +|age| name|rank|min| ++---+-++---+ +| 5| Bob| 1| 5| +| 2|Alice| 1| 2| ++---+-++---+ """ from pyspark.sql.window import WindowSpec if not isinstance(window, WindowSpec): - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28030][SQL] convert filePath to URI in binary file data source
This is an automated email from the ASF dual-hosted git repository. meng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f4829b [SPARK-28030][SQL] convert filePath to URI in binary file data source 4f4829b is described below commit 4f4829b4ae261a9fd656fbf1928e6440d31f8d8c Author: Xiangrui Meng AuthorDate: Wed Jun 12 13:24:02 2019 -0700 [SPARK-28030][SQL] convert filePath to URI in binary file data source ## What changes were proposed in this pull request? Convert `PartitionedFile.filePath` to URI first in binary file data source. Otherwise Spark will throw a FileNotFound exception because we create `Path` with URL encoded string, instead of wrapping it with URI. ## How was this patch tested? Unit test. Closes #24855 from mengxr/SPARK-28030. Authored-by: Xiangrui Meng Signed-off-by: Xiangrui Meng --- .../spark/sql/execution/datasources/FileScanRDD.scala | 2 +- .../datasources/binaryfile/BinaryFileFormat.scala | 3 ++- .../datasources/binaryfile/BinaryFileFormatSuite.scala | 14 ++ 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala index d92ea2e..9e98b0b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala @@ -38,7 +38,7 @@ import org.apache.spark.util.NextIterator * that need to be prepended to each row. * * @param partitionValues value of partition columns to be prepended to each row. - * @param filePath path of the file to read + * @param filePath URI of the file to read * @param start the beginning offset (in bytes) of the block. * @param length number of bytes to read. * @param locations locality information (list of nodes that have the data). diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala index cdc7cd5..fda4e14 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.execution.datasources.binaryfile +import java.net.URI import java.sql.Timestamp import com.google.common.io.{ByteStreams, Closeables} @@ -100,7 +101,7 @@ class BinaryFileFormat extends FileFormat with DataSourceRegister { val maxLength = sparkSession.conf.get(SOURCES_BINARY_FILE_MAX_LENGTH) file: PartitionedFile => { - val path = new Path(file.filePath) + val path = new Path(new URI(file.filePath)) val fs = path.getFileSystem(broadcastedHadoopConf.value.value) val status = fs.getFileStatus(path) if (filterFuncs.forall(_.apply(status))) { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala index 01dc96c..9e2969b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala @@ -368,4 +368,18 @@ class BinaryFileFormatSuite extends QueryTest with SharedSQLContext with SQLTest assert(caught.getMessage.contains("exceeds the max length allowed")) } } + + test("SPARK-28030: support chars in file names that require URL encoding") { +withTempDir { dir => + val file = new File(dir, "test space.txt") + val content = "123".getBytes + Files.write(file.toPath, content, StandardOpenOption.CREATE, StandardOpenOption.WRITE) + val df = spark.read.format(BINARY_FILE).load(dir.getPath) + df.select(col(PATH), col(CONTENT)).first() match { +case Row(p: String, c: Array[Byte]) => + assert(p.endsWith(file.getAbsolutePath), "should support space in file name") + assert(c === content, "should read file with space in file name") + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 37ab433 [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 37ab433 is described below commit 37ab43339d680d6ec6973938737b8a8cd13e6cb1 Author: Dongjoon Hyun AuthorDate: Wed Jun 12 07:34:42 2019 -0700 [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 ## What changes were proposed in this pull request? For Apache Spark 3.0.0 release, this PR aims to update Kafka dependency to 2.2.1 to bring the following improvement and bug fixes like [KAFKA-8134](https://issues.apache.org/jira/browse/KAFKA-8134) (`'linger.ms' must be a long`). https://issues.apache.org/jira/projects/KAFKA/versions/12345010 ## How was this patch tested? Pass the Jenkins. Closes #24847 from dongjoon-hyun/SPARK-28013. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 929961a..62394b2 100644 --- a/pom.xml +++ b/pom.xml @@ -136,7 +136,7 @@ 1.2.1 -2.2.0 +2.2.1 10.12.1.1 1.10.1 1.5.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org