beliefer opened a new pull request #23950: [MINOR][SQL]Add a UT to test insert overwrite noexist local path. URL: https://github.com/apache/spark/pull/23950 ## What changes were proposed in this pull request? In PR 23841, maropu and I have some conversation about insert overwrite noexist local path. In local[*] mode, maropu give a test case as follows: ``` $ls /tmp/noexistdir ls: /tmp/noexistdir: No such file or directory scala> sql("""create table t(c0 int, c1 int)""") scala> spark.table("t").explain == Physical Plan == Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] scala> sql("""insert into t values(1, 1)""") scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") $ls /tmp/noexistdir/t/ _SUCCESS part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000 ``` This test case prove spark will create the not exists path and move middle result from local temporary path to created path.This test based on newest master. I follow the test case provided by maropu,but find another behavior. I run these SQL maropu provided on local[*] deploy mode based on 2.3.0. Inconsistent behavior appears as follows: ``` ls /tmp/noexistdir ls: cannot access /tmp/noexistdir: No such file or directory scala> sql("""create table t(c0 int, c1 int)""") res0: org.apache.spark.sql.DataFrame = [] scala> spark.table("t").explain == Physical Plan == HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] scala> sql("""insert into t values(1, 1)""") scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") res1: org.apache.spark.sql.DataFrame = [] ls /tmp/noexistdir/t/ /tmp/noexistdir/t vi /tmp/noexistdir/t 1 ``` Then I pull the master branch and compile it and deploy it on my hadoop cluster.I get the inconsistent behavior again. The spark version to test is 3.0.0. ``` ls /tmp/noexistdir ls: cannot access /tmp/noexistdir: No such file or directory Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release Spark context Web UI available at http://10.198.66.204:55326 Spark context available as 'sc' (master = local[*], app id = local-1551259036573). Spark session available as 'spark'. Welcome to spark version 3.0.0-SNAPSHOT Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) Type in expressions to have them evaluated. Type :help for more information. scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") res1: org.apache.spark.sql.DataFrame = [] scala> ll /tmp/noexistdir/t -rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t vi /tmp/noexistdir/t 1 ``` The /tmp/noexistdir/t is a file too. I want add a UT to master and need jenkins run it so that prove it or tell me some information. Later,I will close this PR. ## How was this patch tested? UT
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
