date:20190612

[spark] branch master updated (ddf4a50 -> abe370f)

2019-06-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ddf4a50  [SPARK-28031][PYSPARK][TEST] Improve doctest on over function 
of Column
 add abe370f  [SPARK-27322][SQL] DataSourceV2 table relation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  4 +-
 .../spark/sql/catalog/v2/utils/CatalogV2Util.scala | 13 -
 .../spark/sql/util/CaseInsensitiveStringMap.java   | 18 +++
 .../spark/sql/catalyst/analysis/Analyzer.scala | 48 +++--
 .../sql/catalyst/analysis/CheckAnalysis.scala  |  2 +-
 .../spark/sql/catalyst/analysis/ResolveHints.scala | 12 +++--
 .../spark/sql/catalyst/analysis/unresolved.scala   | 13 +++--
 .../apache/spark/sql/catalyst/dsl/package.scala|  5 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  4 +-
 .../datasources/v2/DataSourceV2Implicits.scala |  0
 .../datasources/v2/DataSourceV2Relation.scala  |  2 +
 .../sql/catalyst/parser/PlanParserSuite.scala  | 31 ---
 .../scala/org/apache/spark/sql/SparkSession.scala  |  6 ++-
 .../apache/spark/sql/execution/command/views.scala |  9 ++--
 .../spark/sql/execution/datasources/rules.scala|  8 +--
 .../sql/internal/BaseSessionStateBuilder.scala |  3 ++
 .../org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  2 +-
 .../sql/sources/v2/DataSourceV2SQLSuite.scala  | 62 ++
 .../spark/sql/util/DataFrameCallbackSuite.scala|  2 +-
 .../spark/sql/hive/HiveSessionStateBuilder.scala   |  3 ++
 .../org/apache/spark/sql/hive/HiveStrategies.scala |  5 +-
 .../org/apache/spark/sql/hive/InsertSuite.scala| 12 +
 .../org/apache/spark/sql/hive/test/TestHive.scala  |  2 +-
 24 files changed, 213 insertions(+), 55 deletions(-)
 rename sql/{core => 
catalyst}/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala
 (100%)
 rename sql/{core => 
catalyst}/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 (97%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

2019-06-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 29a39e8  [SPARK-28031][PYSPARK][TEST] Improve doctest on over function 
of Column
29a39e8 is described below

commit 29a39e8e58d99762594f3cf6854810cfb529a251
Author: Liang-Chi Hsieh 
AuthorDate: Thu Jun 13 11:04:41 2019 +0900

[SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

## What changes were proposed in this pull request?

Just found the doctest on `over` function of `Column` is commented out. The 
window spec is also not for the window function used there.

We should either remove the doctest, or improve it.

Because other functions of `Column` have doctest generally, so this PR 
tries to improve it.

## How was this patch tested?

Added doctest.

Closes #24854 from viirya/column-test-minor.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
(cherry picked from commit ddf4a5031287c0c26ea462dd89ea99d769473213)
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/column.py | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index e7dec11..7f12d23 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -668,9 +668,17 @@ class Column(object):
 :return: a Column
 
 >>> from pyspark.sql import Window
->>> window = Window.partitionBy("name").orderBy("age").rowsBetween(-1, 
1)
+>>> window = Window.partitionBy("name").orderBy("age") \
+.rowsBetween(Window.unboundedPreceding, Window.currentRow)
 >>> from pyspark.sql.functions import rank, min
->>> # df.select(rank().over(window), min('age').over(window))
+>>> df.withColumn("rank", rank().over(window)) \
+.withColumn("min", min('age').over(window)).show()
++---+-++---+
+|age| name|rank|min|
++---+-++---+
+|  5|  Bob|   1|  5|
+|  2|Alice|   1|  2|
++---+-++---+
 """
 from pyspark.sql.window import WindowSpec
 if not isinstance(window, WindowSpec):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

2019-06-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ddf4a50  [SPARK-28031][PYSPARK][TEST] Improve doctest on over function 
of Column
ddf4a50 is described below

commit ddf4a5031287c0c26ea462dd89ea99d769473213
Author: Liang-Chi Hsieh 
AuthorDate: Thu Jun 13 11:04:41 2019 +0900

[SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

## What changes were proposed in this pull request?

Just found the doctest on `over` function of `Column` is commented out. The 
window spec is also not for the window function used there.

We should either remove the doctest, or improve it.

Because other functions of `Column` have doctest generally, so this PR 
tries to improve it.

## How was this patch tested?

Added doctest.

Closes #24854 from viirya/column-test-minor.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: HyukjinKwon 
---
 python/pyspark/sql/column.py | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index e7dec11..7f12d23 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -668,9 +668,17 @@ class Column(object):
 :return: a Column
 
 >>> from pyspark.sql import Window
->>> window = Window.partitionBy("name").orderBy("age").rowsBetween(-1, 
1)
+>>> window = Window.partitionBy("name").orderBy("age") \
+.rowsBetween(Window.unboundedPreceding, Window.currentRow)
 >>> from pyspark.sql.functions import rank, min
->>> # df.select(rank().over(window), min('age').over(window))
+>>> df.withColumn("rank", rank().over(window)) \
+.withColumn("min", min('age').over(window)).show()
++---+-++---+
+|age| name|rank|min|
++---+-++---+
+|  5|  Bob|   1|  5|
+|  2|Alice|   1|  2|
++---+-++---+
 """
 from pyspark.sql.window import WindowSpec
 if not isinstance(window, WindowSpec):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28030][SQL] convert filePath to URI in binary file data source

2019-06-12 Thread meng

This is an automated email from the ASF dual-hosted git repository.

meng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f4829b  [SPARK-28030][SQL] convert filePath to URI in binary file 
data source
4f4829b is described below

commit 4f4829b4ae261a9fd656fbf1928e6440d31f8d8c
Author: Xiangrui Meng 
AuthorDate: Wed Jun 12 13:24:02 2019 -0700

[SPARK-28030][SQL] convert filePath to URI in binary file data source

## What changes were proposed in this pull request?

Convert `PartitionedFile.filePath` to URI first in binary file data source. 
Otherwise Spark will throw a FileNotFound exception because we create `Path` 
with URL encoded string, instead of wrapping it with URI.

## How was this patch tested?

Unit test.

Closes #24855 from mengxr/SPARK-28030.

Authored-by: Xiangrui Meng 
Signed-off-by: Xiangrui Meng 
---
 .../spark/sql/execution/datasources/FileScanRDD.scala  |  2 +-
 .../datasources/binaryfile/BinaryFileFormat.scala  |  3 ++-
 .../datasources/binaryfile/BinaryFileFormatSuite.scala | 14 ++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
index d92ea2e..9e98b0b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
@@ -38,7 +38,7 @@ import org.apache.spark.util.NextIterator
  * that need to be prepended to each row.
  *
  * @param partitionValues value of partition columns to be prepended to each 
row.
- * @param filePath path of the file to read
+ * @param filePath URI of the file to read
  * @param start the beginning offset (in bytes) of the block.
  * @param length number of bytes to read.
  * @param locations locality information (list of nodes that have the data).
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
index cdc7cd5..fda4e14 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormat.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.execution.datasources.binaryfile
 
+import java.net.URI
 import java.sql.Timestamp
 
 import com.google.common.io.{ByteStreams, Closeables}
@@ -100,7 +101,7 @@ class BinaryFileFormat extends FileFormat with 
DataSourceRegister {
 val maxLength = sparkSession.conf.get(SOURCES_BINARY_FILE_MAX_LENGTH)
 
 file: PartitionedFile => {
-  val path = new Path(file.filePath)
+  val path = new Path(new URI(file.filePath))
   val fs = path.getFileSystem(broadcastedHadoopConf.value.value)
   val status = fs.getFileStatus(path)
   if (filterFuncs.forall(_.apply(status))) {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala
index 01dc96c..9e2969b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/binaryfile/BinaryFileFormatSuite.scala
@@ -368,4 +368,18 @@ class BinaryFileFormatSuite extends QueryTest with 
SharedSQLContext with SQLTest
   assert(caught.getMessage.contains("exceeds the max length allowed"))
 }
   }
+
+  test("SPARK-28030: support chars in file names that require URL encoding") {
+withTempDir { dir =>
+  val file = new File(dir, "test space.txt")
+  val content = "123".getBytes
+  Files.write(file.toPath, content, StandardOpenOption.CREATE, 
StandardOpenOption.WRITE)
+  val df = spark.read.format(BINARY_FILE).load(dir.getPath)
+  df.select(col(PATH), col(CONTENT)).first() match {
+case Row(p: String, c: Array[Byte]) =>
+  assert(p.endsWith(file.getAbsolutePath), "should support space in 
file name")
+  assert(c === content, "should read file with space in file name")
+  }
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1

2019-06-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 37ab433  [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
37ab433 is described below

commit 37ab43339d680d6ec6973938737b8a8cd13e6cb1
Author: Dongjoon Hyun 
AuthorDate: Wed Jun 12 07:34:42 2019 -0700

[SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1

## What changes were proposed in this pull request?

For Apache Spark 3.0.0 release, this PR aims to update Kafka dependency to 
2.2.1 to bring the following improvement and bug fixes like 
[KAFKA-8134](https://issues.apache.org/jira/browse/KAFKA-8134) (`'linger.ms' 
must be a long`).

https://issues.apache.org/jira/projects/KAFKA/versions/12345010

## How was this patch tested?

Pass the Jenkins.

Closes #24847 from dongjoon-hyun/SPARK-28013.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 929961a..62394b2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -136,7 +136,7 @@
 
 1.2.1
 
-2.2.0
+2.2.1
 10.12.1.1
 1.10.1
 1.5.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ddf4a50 -> abe370f)

[spark] branch branch-2.4 updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

[spark] branch master updated: [SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column

[spark] branch master updated: [SPARK-28030][SQL] convert filePath to URI in binary file data source

[spark] branch master updated: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1

5 matches

Site Navigation

Mail list logo

Footer information