I don’t think there is performance difference between 1.x API and 2.x API.
but it’s not a big issue for your change, only com.databricks.hadoop.mapreduce.lib.input.XmlInputFormat.java <https://github.com/databricks/spark-xml/blob/master/src/main/java/com/databricks/hadoop/mapreduce/lib/input/XmlInputFormat.java> need to change, right? It’s not a big change to 2.x API. if you agree, I can do, but I cannot promise the time within one or two weeks because of my daily job. > On Dec 9, 2015, at 5:01 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > > Hi all, > > I am writing this email to both user-group and dev-group since this is > applicable to both. > > I am now working on Spark XML datasource > (https://github.com/databricks/spark-xml > <https://github.com/databricks/spark-xml>). > This uses a InputFormat implementation which I downgraded to Hadoop 1.x for > version compatibility. > > However, I found all the internal JSON datasource and others in Databricks > use Hadoop 2.x API dealing with TaskAttemptContextImpl by reflecting the > method for this because TaskAttemptContext is a class in Hadoop 1.x and an > interface in Hadoop 2.x. > > So, I looked through the codes for some advantages for Hadoop 2.x API but I > couldn't. > I wonder if there are some advantages for using Hadoop 2.x API. > > I understand that it is still preferable to use Hadoop 2.x APIs at least for > future differences but somehow I feel like it might not have to use Hadoop > 2.x by reflecting a method. > > I would appreciate that if you leave a comment here > https://github.com/databricks/spark-xml/pull/14 > <https://github.com/databricks/spark-xml/pull/14> as well as sending back a > reply if there is a good explanation > > Thanks!