This looks like a bug that path filter doesn't work for hive table reading. Can you open a JIRA ticket?
On Thu, Apr 23, 2020 at 3:15 AM Dhrubajyoti Hati <dhruba.w...@gmail.com> wrote: > Just wondering if any one could help me out on this. > > Thank you! > > > > > *Regards,Dhrubajyoti Hati.* > > > On Wed, Apr 22, 2020 at 7:15 PM Dhrubajyoti Hati <dhruba.w...@gmail.com> > wrote: > >> Hi, >> >> Is there any way to discard files starting with dot(.) or ending with >> .tmp in the hive partition while reading from Hive table using >> spark.read.table method. >> >> I tried using PathFilters but they didn't work. I am using spark-submit >> and passing my python file(pyspark) containing the source code. >> >> spark.sparkContext._jsc.hadoopConfiguration().set("mapreduce.input.pathFilter.class", >> "com.abc.hadoop.utility.TmpFileFilter") >> >> class TmpFileFilter extends PathFilter { >> override def accept(path : Path): Boolean = !path.getName.endsWith(".tmp") >> } >> >> Still in the detailed logs I can see .tmp files are getting considered in >> the detailed logs: >> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >> maprfs:///a/hour=05/host=abc/FlumeData.1587559137715 >> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >> maprfs:///a/hour=05/host=abc/FlumeData.1587556815621 >> 20/04/22 12:58:44 DEBUG MapRFileSystem: getMapRFileStatus >> maprfs:///a/hour=05/host=abc/.FlumeData.1587560277337.tmp >> >> >> Is there any way to discard the tmp(.tmp) or the hidden files(filename >> starting with dot or underscore) in hive partitions while reading from >> spark? >> >> >> >> >> *Regards,Dhrubajyoti Hati.* >> >