Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19868#discussion_r181014951 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -176,12 +176,13 @@ class HadoopTableReader( val matches = fs.globStatus(pathPattern) matches.foreach(fileStatus => existPathSet += fileStatus.getPath.toString) } - // convert /demo/data/year/month/day to /demo/data/*/*/*/ + // convert /demo/data/year/month/day to /demo/data/year/month/*/ --- End diff -- @cloud-fan @jiangxb1987 Thanks a lot for review. > Em... It seems we have to check all the levels unless we have specified a value for each partition column. We can make some improvement here but seems that require more complicated approach. Yes, true. In this change, I only optimize when user specify for each partition column, which is very common in the production -- as our user always did: `select xxx from yyy where year=yy and month=mm and day=dd` I'm not sure about you guys idea: leave the current logic as it is(at least the code logic now is very simple)? or implement a more complicated approach and defend as many cases as possible? or do some improvement based on this pr and cover some very common cases? Thanks again for review :)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org