[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

xuanyuanking Wed, 13 Jun 2018 08:21:41 -0700

Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/17702
  
    ```
    This approach only works if the first level glob pattern matches a lot of 
directories.
    ```
    Yep, actually in our internal usage, we leave the problem to user and they 
should use first wild cast to represent most of file.
    ```
    Maybe we should just fork the Hadoop Globber and improve it to run in 
parallel.
    ```
    Thanks for your detailed explain and guidance, I'll reconsider this and 
open another PR.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

Reply via email to