Hi, I have a hive partition table created using sparkSession. I would like to insert/overwrite Dataframe data to specific set of partition without loosing any other partition. In each run I have to update Set of partitions not just one.
e.g. I have dataframe with bid=1, bid=2, bid=3 in first time and I can write it by using `df.write.mode(SaveMode.Overwrite).partitionBy("bid").parquet(TableBase Location)` It generates dirs: bid=1, bid=2, bid=3 inside TableBaseLocation But next time when I have a dataframe with bid=1, bid=4 and use same code above it removes bid=2 and bid=3. in other words I dont get idempotency. I tried SaveMode.append but that creates duplicate data inside "bid=1" I read https://issues.apache.org/jira/browse/SPARK-18183 With that approach it seems like I may have to updated multiple partition manually for each input partition. That seems like lot of work on every update. Is there a better way for this? Can this fix be apply to dataframe based approach as well? Thanks -- <http://www.xactlycorp.com/email-click/> <https://www.instagram.com/xactlycorp/> <https://www.linkedin.com/company/xactly-corporation> <https://twitter.com/Xactly> <https://www.facebook.com/XactlyCorp> <http://www.youtube.com/xactlycorporation>