Hi all, I have multiple (>100) jobs running concurrently (sharing the same hive context) that are each appending new rows to the same dataframe registered as a temp table.
Currently I am using unionAll and registering that dataframe again as a temp table in each job: Given an existing dataframe registered as the temp table "test": //Create dataframe with new rows to append val newRows = hiveContext.createDataframe (rows,schema) //Retrieve existing dataframe and append the new dataframe via unionAll val updatedDF=hiveContext.table("test").unionAll(newRows) //uncache existing dataframe hiveContext.uncacheTable("test") //Register the updated DF as a temp table updatedDF.registerTempTable("test") //Cache the updated dataframe hiveContext.table("test").cache I am finding that using this approach can deplete memory very quickly since each call to ".cache" in each of the jobs is creating a new entry in memory for the same dataframe. Does anyone know if theres a more optimal solution to the above?. Thanks, Roger