subject:"spark git commit\: \[SPARK\-23425\]\[SQL\]\[FOLLOWUP\] Support wildcards in HDFS path for loadtable command."

spark git commit: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command.

2018-09-17 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 e368efcf5 -> 43c9b1085


[SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable 
command.

What changes were proposed in this pull request
Updated the Migration guide for the behavior changes done in the JIRA issue 
SPARK-23425.

How was this patch tested?
Manually verified.

Closes #22396 from sujith71955/master_newtest.

Authored-by: s71955 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 619c949019feccd3fc2c9e58a841c655d05216f3)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43c9b108
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43c9b108
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43c9b108

Branch: refs/heads/branch-2.4
Commit: 43c9b108545adcd0f99bd4408759fbee440c560f
Parents: e368efc
Author: s71955 
Authored: Mon Sep 17 19:22:27 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Sep 17 19:23:08 2018 +0800

--
 docs/sql-programming-guide.md|  1 +
 .../spark/sql/hive/execution/SQLQuerySuite.scala | 15 +++
 2 files changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43c9b108/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 9da7d64..e262987 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
   - Since Spark 2.4, File listing for compute statistics is done in parallel 
by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary 
files are not counted as data files when calculating table size during 
Statistics computation.
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
+  - Since Spark 2.4, The LOAD DATA command supports wildcard `?` and `*`, 
which match any one character, and zero or more characters, respectively. 
Example: `LOAD DATA INPATH '/tmp/folder*/'` or `LOAD DATA INPATH 
'/tmp/part-?'`. Special Characters like `space` also now work in paths. 
Example: `LOAD DATA INPATH '/tmp/folder name/'`.
 
 ## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above
 

http://git-wip-us.apache.org/repos/asf/spark/blob/43c9b108/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 20c4c36..e49aea2 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -1916,6 +1916,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("SPARK-23425 Test LOAD DATA LOCAL INPATH with space in file name") {
+withTempDir { dir =>
+  val path = dir.toURI.toString.stripSuffix("/")
+  val dirPath = dir.getAbsoluteFile
+  for (i <- 1 to 3) {
+Files.write(s"$i", new File(dirPath, s"part-r- $i"), 
StandardCharsets.UTF_8)
+  }
+  withTable("load_t") {
+sql("CREATE TABLE load_t (a STRING)")
+sql(s"LOAD DATA LOCAL INPATH '$path/part-r- 1' INTO TABLE load_t")
+checkAnswer(sql("SELECT * FROM load_t"), Seq(Row("1")))
+  }
+}
+  }
+
   test("Support wildcard character in folderlevel for LOAD DATA LOCAL INPATH") 
{
 withTempDir { dir =>
   val path = dir.toURI.toString.stripSuffix("/")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command.

2018-09-17 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master b66e14dc9 -> 619c94901


[SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable 
command.

What changes were proposed in this pull request
Updated the Migration guide for the behavior changes done in the JIRA issue 
SPARK-23425.

How was this patch tested?
Manually verified.

Closes #22396 from sujith71955/master_newtest.

Authored-by: s71955 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/619c9490
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/619c9490
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/619c9490

Branch: refs/heads/master
Commit: 619c949019feccd3fc2c9e58a841c655d05216f3
Parents: b66e14d
Author: s71955 
Authored: Mon Sep 17 19:22:27 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Sep 17 19:22:27 2018 +0800

--
 docs/sql-programming-guide.md|  1 +
 .../spark/sql/hive/execution/SQLQuerySuite.scala | 15 +++
 2 files changed, 16 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/619c9490/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 9da7d64..e262987 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
   - Since Spark 2.4, File listing for compute statistics is done in parallel 
by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary 
files are not counted as data files when calculating table size during 
Statistics computation.
   - Since Spark 2.4, empty strings are saved as quoted empty strings `""`. In 
version 2.3 and earlier, empty strings are equal to `null` values and do not 
reflect to any characters in saved CSV files. For example, the row of `"a", 
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as 
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to 
empty (not quoted) string.  
+  - Since Spark 2.4, The LOAD DATA command supports wildcard `?` and `*`, 
which match any one character, and zero or more characters, respectively. 
Example: `LOAD DATA INPATH '/tmp/folder*/'` or `LOAD DATA INPATH 
'/tmp/part-?'`. Special Characters like `space` also now work in paths. 
Example: `LOAD DATA INPATH '/tmp/folder name/'`.
 
 ## Upgrading From Spark SQL 2.3.0 to 2.3.1 and above
 

http://git-wip-us.apache.org/repos/asf/spark/blob/619c9490/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 20c4c36..e49aea2 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -1916,6 +1916,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("SPARK-23425 Test LOAD DATA LOCAL INPATH with space in file name") {
+withTempDir { dir =>
+  val path = dir.toURI.toString.stripSuffix("/")
+  val dirPath = dir.getAbsoluteFile
+  for (i <- 1 to 3) {
+Files.write(s"$i", new File(dirPath, s"part-r- $i"), 
StandardCharsets.UTF_8)
+  }
+  withTable("load_t") {
+sql("CREATE TABLE load_t (a STRING)")
+sql(s"LOAD DATA LOCAL INPATH '$path/part-r- 1' INTO TABLE load_t")
+checkAnswer(sql("SELECT * FROM load_t"), Seq(Row("1")))
+  }
+}
+  }
+
   test("Support wildcard character in folderlevel for LOAD DATA LOCAL INPATH") 
{
 withTempDir { dir =>
   val path = dir.toURI.toString.stripSuffix("/")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command.

spark git commit: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadtable command.

2 matches

Site Navigation

Mail list logo

Footer information