Hi all, I am writing this email to both user-group and dev-group since this is applicable to both.
I am now working on Spark XML datasource ( https://github.com/databricks/spark-xml). This uses a InputFormat implementation which I downgraded to Hadoop 1.x for version compatibility. However, I found all the internal JSON datasource and others in Databricks use Hadoop 2.x API dealing with TaskAttemptContextImpl by reflecting the method for this because TaskAttemptContext is a class in Hadoop 1.x and an interface in Hadoop 2.x. So, I looked through the codes for some advantages for Hadoop 2.x API but I couldn't. I wonder if there are some advantages for using Hadoop 2.x API. I understand that it is still preferable to use Hadoop 2.x APIs at least for future differences but somehow I feel like it might not have to use Hadoop 2.x by reflecting a method. I would appreciate that if you leave a comment here https://github.com/databricks/spark-xml/pull/14 as well as sending back a reply if there is a good explanation Thanks!