Github user yanakad commented on the pull request:

    https://github.com/apache/spark/pull/10379#issuecomment-166020526
  
    @liancheng Would logging the fail paths at WARN or ERROR level be an 
acceptable compromise? I am not sure if you're advising that the fix is not 
good enough or if you're disagreeing that there is an issue.
    I think the original behavior *is* a problem -- if you have paths like this 
/root/account=number/date='yyyy-mo'/... , you create a DF at the root level and 
you execute 'select * where account=nonexistent' you'd get an empty data frame. 
If you execute a query with where date in(mo1,mo2,mo3) and there is no mo3 
partition, you'd still get data for months1 & 2. On the other hand, if you try 
to create a DF at /root/account=nonexistent you'd get an exception. I have a 
very heavily partitioned space, which is why I am creating dataframes as low as 
possible, running into this problem when a partition path is missing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to