[GitHub] [spark] beliefer opened a new pull request #23950: [MINOR][SQL]Add a UT to test insert overwrite noexist local path.

GitBox Sun, 03 Mar 2019 23:37:46 -0800

beliefer opened a new pull request #23950: [MINOR][SQL]Add a UT to test insert 
overwrite noexist local path.
URL: https://github.com/apache/spark/pull/23950
 
 
   ## What changes were proposed in this pull request?
   
   In PR 23841, maropu and I have some conversation about insert overwrite 
noexist local path.
   In local[*] mode, maropu give a test case as follows:
   ```
   $ls /tmp/noexistdir
   ls: /tmp/noexistdir: No such file or directory
   
   scala> sql("""create table t(c0 int, c1 int)""")
   scala> spark.table("t").explain
   == Physical Plan ==
   Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]
   
   scala> sql("""insert into t values(1, 1)""")
   scala> sql("""select * from t""").show
   +---+---+
   | c0| c1|
   +---+---+
   |  1|  1|
   +---+---+
   
   scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")
   
   $ls /tmp/noexistdir/t/
   _SUCCESS  part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000
   ```
   This test case prove spark will create the not exists path and move middle 
result from local temporary path to created path.This test based on newest 
master.
   I follow the test case provided by maropu,but find another behavior.
   I run these SQL maropu provided on local[*] deploy mode based on 2.3.0.
   Inconsistent behavior appears as follows:
   ```
   ls /tmp/noexistdir
   ls: cannot access /tmp/noexistdir: No such file or directory
   
   scala> sql("""create table t(c0 int, c1 int)""")
   res0: org.apache.spark.sql.DataFrame = []
   scala> spark.table("t").explain
   == Physical Plan ==
   HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]
   
   scala> sql("""insert into t values(1, 1)""")
   scala> sql("""select * from t""").show
   +---+---+                                                                    
   
   | c0| c1|
   +---+---+
   |  1|  1|
   +---+---+
   
   scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")
   res1: org.apache.spark.sql.DataFrame = [] 
   
   ls /tmp/noexistdir/t/
   /tmp/noexistdir/t
   
   vi /tmp/noexistdir/t
     1 
   ```
   Then I pull the master branch and compile it and deploy it on my hadoop 
cluster.I get the inconsistent behavior again.
   The spark version to test is 3.0.0.
   ```
   ls /tmp/noexistdir
   ls: cannot access /tmp/noexistdir: No such file or directory
   Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector 
with the Serial old collector is deprecated and will likely be removed in a 
future release
   Spark context Web UI available at http://10.198.66.204:55326
   Spark context available as 'sc' (master = local[*], app id = 
local-1551259036573).
   Spark session available as 'spark'.
   Welcome to spark version 3.0.0-SNAPSHOT
   Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_131)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> sql("""select * from t""").show
   +---+---+                                                                    
   
   | c0| c1|
   +---+---+
   |  1|  1|
   +---+---+
   
   
   scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")
   res1: org.apache.spark.sql.DataFrame = []                                    
   
   
   scala> 
   ll /tmp/noexistdir/t
   -rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t
   vi /tmp/noexistdir/t
     1
   ```
   The /tmp/noexistdir/t is a file too.
   
   I want add a UT to master and need jenkins run it so that prove it or tell 
me some information.
   Later,I will close this PR.
   ## How was this patch tested?
   
   UT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer opened a new pull request #23950: [MINOR][SQL]Add a UT to test insert overwrite noexist local path.

Reply via email to