Github user jinxing64 commented on a diff in the pull request:
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -176,12 +176,13 @@ class HadoopTableReader(
                   val matches = fs.globStatus(pathPattern)
                   matches.foreach(fileStatus => existPathSet += 
    -            // convert  /demo/data/year/month/day  to  /demo/data/*/*/*/
    +            // convert  /demo/data/year/month/day  to  
    --- End diff --
    @cloud-fan @jiangxb1987
    Thanks a lot for review.
    > Em... It seems we have to check all the levels unless we have specified a 
value for each partition column. We can make some improvement here but seems 
that require more complicated approach.
    Yes, true. In this change, I only optimize when user specify for each 
partition column, which is very common in the production -- as our user always 
did: `select xxx from yyy where year=yy and month=mm and day=dd`
    I'm not sure about you guys idea:  leave the current logic as it is(at 
least the code logic now is very simple)? or implement a more complicated 
approach and defend as many cases as possible? or do some improvement based on 
this pr and cover some very common cases?
    Thanks again for review :)


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to