Hi I am new in Apache Spark and I created a spark job that reads the data from a Mysql database and does some processing on it and then commits it to another table.
The odd thing I faced was that Spark reads all the data from the table when I use `sparkSession.read.jdbc` and `sparkDf.rdd.map` *waits* the whole *iteration to be done *and then *starts* to write and there is no checkpoint here anywhere! In such a manner all the intermediate work gets lost on node/cluster reboot. Instead it can read from the table in batches, perform the process and then write the results this is way more fault tolerant. I was wondering how I can achieve this? Many Thanks