Hi, to get DataFrame level write metrics you can take a look at the following trait : https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala and a basic implementation example: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
and here is an example of how it is being used in FileStreamSink: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala#L178 - about the good practise - it depends on your use case but Generally speaking I would not do it - at least not for checking your logic/ checking spark is working correctly. בתאריך יום א׳, 1 במרץ 2020 ב-14:32 מאת Manjunath Shetty H < manjunathshe...@live.com>: > Hi all, > > Basically my use case is to validate the DataFrame rows count before and > after writing to HDFS. Is this even to good practice ? Or Should relay on > spark for guaranteed writes ?. > > If it is a good practice to follow then how to get the DataFrame level > write metrics ? > > Any pointers would be helpful. > > > Thanks and Regards > Manjunath >