[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449684#comment-16449684 ] Manish Kumar commented on SPARK-13699: -- I am not sure whether the issue is resolved or not. But as a workaround, I have used JDBC to drop the table and then saved data using SAVE mode. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi >Priority: Major > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449678#comment-16449678 ] Hyukjin Kwon commented on SPARK-13699: -- Mind opening a separate JIRA with details and a reproducer if possible? > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi >Priority: Major > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449661#comment-16449661 ] Ashish commented on SPARK-13699: Is this issue gone resolve . I am facing same issue while writing to table in overwrite mode its truncating the table. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi >Priority: Major > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557996#comment-15557996 ] Hyukjin Kwon commented on SPARK-13699: -- Thank you. I will try to follow it. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272123#comment-15272123 ] Manish Kumar commented on SPARK-13699: -- Hi, Even I am facing a similar issue with overwrite mode. I am trying below to overwrite data frame to oracle table: dataframe.write.mode("overwrite").jdbc(URL,table_name,properties). In this case, table_name already exist in Oracle but when I am giving overwrite, it first drops the table so throwing ora-00902 invalid datatype in oracle. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192940#comment-15192940 ] Dhaval Modi commented on SPARK-13699: - Hi Suresh, Thanks for your input. But when DAG is generated, it should take care if source & target is same. One suggestion, if DAG can add the data in temporary table and then truncate and load it in target table. Meanwhile, I am applying this logic explicitly. Regards, Dhaval > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183996#comment-15183996 ] Suresh Thalamati commented on SPARK-13699: -- Thank you for providing the reproduction to the problem I was able to reproduce the issue. Problem is you are trying to overwrite a table that is also being read in the data frame. This is not allowed , it should fail with an error (I noticed in some cases I get an error org.apache.spark.sql.AnalysisException: Cannot overwrite table `t1` that is also being read from).I think this usage should raise an error. Truncate is any interesting option , especially with jdbc data source. But that will not address the problem you are running into, it will run into same problem as Overwrite. {code} scala> tgtFinal.explain == Physical Plan == Union :- WholeStageCodegen : : +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230 as string) AS enddate#263,updatedate#231] : : +- Filter (currind#228 = N) : :+- INPUT : +- HiveTableScan [enddate#230,updatedate#231,col2#224,col1#223,batchid#227,col3#225,startdate#229,currind#228,col4#226], MetastoreRelation default, tgt_table, None :- WholeStageCodegen : : +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,cast(enddate#230 as string) AS enddate#264,updatedate#231] : : +- INPUT : +- Except : :- WholeStageCodegen : : : +- Filter (currind#228 = Y) : : : +- INPUT : : +- HiveTableScan [col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231], MetastoreRelation default, tgt_table, None : +- WholeStageCodegen :: +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231] :: +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 as double)], Inner, BuildRight, None :::- Filter (currind#228 = Y) ::: +- INPUT ::+- INPUT ::- HiveTableScan [col1#223,col2#224,col3#225,col4#226,batchid#227,currind#228,startdate#229,enddate#230,updatedate#231], MetastoreRelation default, tgt_table, None :+- HiveTableScan [col1#219], MetastoreRelation default, src_table, None :- WholeStageCodegen : : +- Project [col1#223,col2#224,col3#225,col4#226,batchid#227,UDF(col1#223) AS currInd#232,startdate#229,2016-03-07 15:12:20.584 AS endDate#265,1457392340584000 AS updateDate#234] : : +- BroadcastHashJoin [cast(col1#223 as double)], [cast(col1#219 as double)], Inner, BuildRight, None : ::- Project [col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226] : :: +- Filter (currind#228 = Y) : :: +- INPUT : :+- INPUT : :- HiveTableScan [col3#225,startdate#229,col2#224,col1#223,batchid#227,col4#226,currind#228], MetastoreRelation default, tgt_table, None : +- HiveTableScan [col1#219], MetastoreRelation default, src_table, None +- WholeStageCodegen : +- Project [cast(col1#219 as string) AS col1#266,col2#220,col3#221,col4#222,UDF(cast(col1#219 as string)) AS batchId#235,UDF(cast(col1#219 as string)) AS currInd#236,1457392340584000 AS startDate#237,date_format(cast(UDF(cast(col1#219 as string)) as timestamp),-MM-dd HH:mm:ss) AS endDate#238,1457392340584000 AS updateDate#239] : +- INPUT +- HiveTableScan [col1#219,col2#220,col3#221,col4#222], MetastoreRelation default, src_table, None scala> {code} > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182031#comment-15182031 ] Dhaval Modi commented on SPARK-13699: - TGT_TABLE DDL: CREATE TABLE IF NOT EXISTS tgt_table (col1 string, col2 int, col3 timestamp, col4 decimal(4,1), batchId string, currInd string, startDate timestamp, endDate timestamp, updateDate timestamp) stored as orc; SRC_TABLE DDL: CREATE TABLE IF NOT EXISTS src_table (col1 int, col2 int, col3 timestamp, col4 decimal(4,1)) stored as orc; INSERT STMT: insert into table src_table values('1',1,'2016-2-3 00:00:00',23.1); insert into table src_table values('2',1,'2016-2-3 00:00:00',23.1); insert into table tgt_table values('1',2,'2016-2-3 00:00:00',23.1, '13', 'Y', '2016-2-3 00:00:00', '2016-2-3 00:00:00', '2016-2-3 00:00:00'); insert into table tgt_table values('1',3,'2016-2-3 00:00:00',23.1, '13', 'N', '2016-2-1 00:00:00', '2016-2-1 00:00:00', '2016-2-3 00:00:00'); insert into table tgt_table values('3',3,'2016-2-3 00:00:00',23.1, '13', 'Y', '2016-2-1 00:00:00', '2016-2-1 00:00:00', '2016-2-3 00:00:00'); > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181916#comment-15181916 ] Xiao Li commented on SPARK-13699: - [~mysti] Could you show the script how you create the original tables, especially `tgt_table`? > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181804#comment-15181804 ] Xiao Li commented on SPARK-13699: - After a research, we can NOT truncate the table if the table is created with EXTERNAL keyword, because all data resides outside of Hive Meta store. [~yhuai] Is that the reason why we chose drop-and-then-recreate the Hive table instead of truncate the table when the mode is SaveMode.Overwrite? > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181791#comment-15181791 ] Xiao Li commented on SPARK-13699: - Now, I see your points. Will take a look at it. Thanks! > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181780#comment-15181780 ] Dhaval Modi commented on SPARK-13699: - == Code Snippet === val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); val src=sqlContext.sql("select * from src_table"); val tgt=sqlContext.sql("select * from tgt_table"); var tgtFinal=tgt.filter("currind = 'N'"); //Add to final table val tgtActive=tgt.filter("currind = 'Y'"); #src.select("col1").except(src.select("col1").as('a).join(tgtActive.select("col1").as('b),"col1")) val newTgt1 = tgtActive.as('a).join(src.as('b),$"a.col1" === $"b.col1") #val newTgt2 = tgtActive.except(newTgt1.select("a.*")); tgtFinal = tgtFinal.unionAll(tgtActive.except(newTgt1.select("a.*"))); var srcInsert = src.except(newTgt1.select("b.*")) import org.apache.spark.sql._ val inBatchID = udf((t:String) => "13" ) val inCurrInd = udf((t:String) => "Y" ) val NCurrInd = udf((t:String) => "N" ) val endDate = udf((t:String) => "-12-31 23:59:59") tgtFinal = tgtFinal.unionAll(newTgt1.select("a.*").withColumn("currInd", NCurrInd(col("col1"))).withColumn("endDate", current_timestamp()).withColumn("updateDate", current_timestamp())) srcInsert = src.withColumn("batchId", inBatchID(col("col1"))).withColumn("currInd", inCurrInd(col("col1"))).withColumn("startDate", current_timestamp()).withColumn("endDate", date_format(endDate(col("col1")),"-MM-dd HH:mm:ss")).withColumn("updateDate", current_timestamp()) tgtFinal = tgtFinal.unionAll(srcInsert) tgtFinal.write().mode(SaveMode.Append).saveAsTable(tgt_table) === Code Snippet = > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181777#comment-15181777 ] Dhaval Modi commented on SPARK-13699: - This should be a bug, as it fails to overwrite the table, throwing error. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org