Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Cheng Lian
Hi Gil, Sorry for the late reply and thanks for raising this question. The file listing logic in HadoopFsRelation is intentionally made different from Hadoop FileInputFormat. Here are the reasons: 1. Efficiency: when computing RDD partitions, FileInputFormat.listStatus() is called on the

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Gil Vernik
@IBMIL, Dev dev@spark.apache.org Date: 12/08/2015 10:51 Subject:Re: possible issues with listing objects in the HadoopFSrelation Hi Gil, Sorry for the late reply and thanks for raising this question. The file listing logic in HadoopFsRelation is intentionally made different from

possible issues with listing objects in the HadoopFSrelation

2015-08-10 Thread Gil Vernik
Just some thoughts, hope i didn't missed something obvious. HadoopFSRelation calls directly FileSystem class to list files in the path. It looks like it implements basically the same logic as in the FileInputFormat.listStatus method ( located in hadoop-map-reduce-client-core) The point is