turboFei commented on issue #24610: [SPARK-27716][SQL] Complete the transactions support for part of jdbc datasource operations. URL: https://github.com/apache/spark/pull/24610#issuecomment-494872278 > > I don't know this part well, but I'm pretty sure that the transaction is supposed to be per partition. These writes must happen separately > > Yes, you are right, the transaction is supposed to be per partition. > For these cases, where the table to be saved can be dropped first or not existed, we can save the rdd data to a tempTable and record the successful partitions num with the help of accumulator. > At last, we compare the accumulator's value with partitions num of dataFrame to get whether all partitions are successful. > If all partitions are successful, we rename the tempTable to destination table. > Therefor, we can make the save operation for all partitions in a single transaction. > @gatorsmile Could you help me review this?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
