[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19823


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-30 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r154187840
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,14 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+
+test("load command for non local invalid path validation") {
--- End diff --

Right gatorsmile, this is why test case was failing, one more change i have 
done,Since i am dealing with hive table i moved this test case to 
HiveDDLSuite.scala, hope its fine. Thanks for the review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153922124
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,14 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+
+test("load command for non local invalid path validation") {
--- End diff --

Please move this test case out of 
```
Seq(true, false).foreach { caseSensitive =>
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-28 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153693386
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+test("load command invalid path validation ") {
--- End diff --

fixed. thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-28 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153693359
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+test("load command invalid path validation ") {
--- End diff --

updated it. In my local build it was working fine. Thanks for the feedback


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153678875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+test("load command invalid path validation ") {
--- End diff --

nit: insert an empty line before the test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153678491
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 }
+test("load command invalid path validation ") {
--- End diff --

@sujith71955 according to Jenkins, there's a whitespace at end of line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153135417
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
+  val srcPath = new Path(path)
--- End diff --

I think here `path` is an URI here otherwise `val uri = new URI(path)` 
should fail, and we return `uri`. So, I think we should check if the `uri` is 
valid.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-27 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153131202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
+  val srcPath = new Path(path)
--- End diff --

But will this lose the normalization in `new Path(path)` ? Or normalization 
in `URI` covers it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-27 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153127637
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
-
+  test("load command invalid path validation ") {
--- End diff --

nit: also blank lines


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153091293
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
+  val srcPath = new Path(path)
--- End diff --

nit: Let's use `new Path(uri)`. I think we better validate `uri` in this 
scope.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153081211
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153081151
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
+  val srcPath = new Path(path)
+  val fs = srcPath.getFileSystem(hadoopConf)
+  if(!fs.exists(srcPath)) {
+ throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
--- End diff --

two space indents. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153080935
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
-
+  test("load command invalid path validation ") {
+withTable("tbl") {
+sql("CREATE TABLE tbl(i INT, j STRING) USING parquet")
+val e = intercept[AnalysisException](sql("load data inpath 
'hdfs://localhost/doesnotexist.csv' into table tbl"))
+assert(e.message.contains("LOAD DATA input path does not exist"))
--- End diff --

indents between 2629 and 2631


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153080918
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
-
+  test("load command invalid path validation ") {
--- End diff --

Move this to `DDLSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153080506
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
--- End diff --

You can probably lift this out of the if-else and use it in the other 
branch too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19823#discussion_r153080500
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -341,6 +341,12 @@ case class LoadDataCommand(
   } else {
 val uri = new URI(path)
 if (uri.getScheme() != null && uri.getAuthority() != null) {
+  val hadoopConf = sparkSession.sessionState.newHadoopConf()
+  val srcPath = new Path(path)
+  val fs = srcPath.getFileSystem(hadoopConf)
+  if(!fs.exists(srcPath)) {
--- End diff --

Nit: space after if


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...

2017-11-26 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/spark/pull/19823

[SPARK-22601][SQL] Data load is getting displayed successful on providing 
non existing hdfs file path

## What changes were proposed in this pull request?
When user tries to load data with a non existing hdfs file path system is 
not validating it and the load command operation is getting successful.
This is misleading to the user. already there is a validation in the 
scenario of none existing local file path. This PR has added validation in the 
scenario of nonexisting hdfs file path
## How was this patch tested?
UT has been added for verifying the issue, also snapshots has been added 
after the verification in a spark yarn cluster

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/spark master_LoadComand_Issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19823


commit 5b247a8b3065a6f7f3a78380af898d9c98eeec8b
Author: sujith71955 
Date:   2017-11-24T13:31:22Z

[SPARK-22601][SQL] Data load is getting displayed successful on providing 
non existing hdfs file path
## What changes were proposed in this pull request?
When user tries to load data with a non existing hdfs file path system is 
not validating it and the load command operation is getting successful.
This is misleading to the user. already there is a validation in the 
scenario of local file path. This PR has added validation in the scenario of 
hdfs file path
## How was this patch tested?
existing tests are present to verify the impact and manually the scenario 
is been verified in hdfs cluster




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org