[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19823 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r154187840 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,14 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } + +test("load command for non local invalid path validation") { --- End diff -- Right gatorsmile, this is why test case was failing, one more change i have done,Since i am dealing with hive table i moved this test case to HiveDDLSuite.scala, hope its fine. Thanks for the review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153922124 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,14 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } + +test("load command for non local invalid path validation") { --- End diff -- Please move this test case out of ``` Seq(true, false).foreach { caseSensitive => ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153693386 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } +test("load command invalid path validation ") { --- End diff -- fixed. thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153693359 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } +test("load command invalid path validation ") { --- End diff -- updated it. In my local build it was working fine. Thanks for the feedback --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153678875 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } +test("load command invalid path validation ") { --- End diff -- nit: insert an empty line before the test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153678491 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2392,5 +2392,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } } +test("load command invalid path validation ") { --- End diff -- @sujith71955 according to Jenkins, there's a whitespace at end of line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153135417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() + val srcPath = new Path(path) --- End diff -- I think here `path` is an URI here otherwise `val uri = new URI(path)` should fail, and we return `uri`. So, I think we should check if the `uri` is valid. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153131202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() + val srcPath = new Path(path) --- End diff -- But will this lose the normalization in `new Path(path)` ? Or normalization in `URI` covers it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153127637 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)")) assert(e.message.contains("Invalid number of arguments")) } - + test("load command invalid path validation ") { --- End diff -- nit: also blank lines --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153091293 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() + val srcPath = new Path(path) --- End diff -- nit: Let's use `new Path(uri)`. I think we better validate `uri` in this scope. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153081211 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153081151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() + val srcPath = new Path(path) + val fs = srcPath.getFileSystem(hadoopConf) + if(!fs.exists(srcPath)) { + throw new AnalysisException(s"LOAD DATA input path does not exist: $path") --- End diff -- two space indents. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153080935 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)")) assert(e.message.contains("Invalid number of arguments")) } - + test("load command invalid path validation ") { +withTable("tbl") { +sql("CREATE TABLE tbl(i INT, j STRING) USING parquet") +val e = intercept[AnalysisException](sql("load data inpath 'hdfs://localhost/doesnotexist.csv' into table tbl")) +assert(e.message.contains("LOAD DATA input path does not exist")) --- End diff -- indents between 2629 and 2631 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153080918 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2624,7 +2624,13 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)")) assert(e.message.contains("Invalid number of arguments")) } - + test("load command invalid path validation ") { --- End diff -- Move this to `DDLSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153080506 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() --- End diff -- You can probably lift this out of the if-else and use it in the other branch too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19823#discussion_r153080500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -341,6 +341,12 @@ case class LoadDataCommand( } else { val uri = new URI(path) if (uri.getScheme() != null && uri.getAuthority() != null) { + val hadoopConf = sparkSession.sessionState.newHadoopConf() + val srcPath = new Path(path) + val fs = srcPath.getFileSystem(hadoopConf) + if(!fs.exists(srcPath)) { --- End diff -- Nit: space after if --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19823: [SPARK-22601][SQL] Data load is getting displayed...
GitHub user sujith71955 opened a pull request: https://github.com/apache/spark/pull/19823 [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing hdfs file path ## What changes were proposed in this pull request? When user tries to load data with a non existing hdfs file path system is not validating it and the load command operation is getting successful. This is misleading to the user. already there is a validation in the scenario of none existing local file path. This PR has added validation in the scenario of nonexisting hdfs file path ## How was this patch tested? UT has been added for verifying the issue, also snapshots has been added after the verification in a spark yarn cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujith71955/spark master_LoadComand_Issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19823.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19823 commit 5b247a8b3065a6f7f3a78380af898d9c98eeec8b Author: sujith71955Date: 2017-11-24T13:31:22Z [SPARK-22601][SQL] Data load is getting displayed successful on providing non existing hdfs file path ## What changes were proposed in this pull request? When user tries to load data with a non existing hdfs file path system is not validating it and the load command operation is getting successful. This is misleading to the user. already there is a validation in the scenario of local file path. This PR has added validation in the scenario of hdfs file path ## How was this patch tested? existing tests are present to verify the impact and manually the scenario is been verified in hdfs cluster --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org