Otherwise, please consider using https://github.com/databricks/spark-xml.
Actually, there is a function to find the input file name, which is..
input_file_name function,
https://github.com/apache/spark/blob/5f342049cce9102fb62b4de2d8d8fa691c2e8ac4/sql/core/src/main/scala/org/apache/spark/sql/func
Hi Baahu,
That should not be a problem, given you allocate sufficient buffer for
reading.
I was just working on implementing a patch[1] to support the feature for
reading wholetextfiles in SQL. This can actually be slightly better
approach, because here we read to offheap memory for holding data(
Hi,
We have a requirement where in we need to process set of xml files, each of
the xml files contain several records (eg:
data of record 1..
data of record 2..
Expected output is
Since we needed file name as well in output ,we chose wholetextfile() . We
had to go against