Re: Large files with wholetextfile()

2016-07-12 Thread Hyukjin Kwon
Otherwise, please consider using https://github.com/databricks/spark-xml. Actually, there is a function to find the input file name, which is.. input_file_name function, https://github.com/apache/spark/blob/5f342049cce9102fb62b4de2d8d8fa691c2e8ac4/sql/core/src/main/scala/org/apache/spark/sql/func

Re: Large files with wholetextfile()

2016-07-12 Thread Prashant Sharma
Hi Baahu, That should not be a problem, given you allocate sufficient buffer for reading. I was just working on implementing a patch[1] to support the feature for reading wholetextfiles in SQL. This can actually be slightly better approach, because here we read to offheap memory for holding data(

Large files with wholetextfile()

2016-07-12 Thread Bahubali Jain
Hi, We have a requirement where in we need to process set of xml files, each of the xml files contain several records (eg: data of record 1.. data of record 2.. Expected output is Since we needed file name as well in output ,we chose wholetextfile() . We had to go against