mengxr commented on a change in pull request #24518: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources URL: https://github.com/apache/spark/pull/24518#discussion_r281410873
########## File path: docs/sql-data-sources-binaryFile.md ########## @@ -28,50 +28,36 @@ It produces a DataFrame with the following columns and possibly partition column * `length`: LongType * `content`: BinaryType -It supports the following read option: -<table class="table"> - <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> - <tr> - <td><code>pathGlobFilter</code></td> - <td>none (accepts all)</td> - <td> - An optional glob pattern to only include files with paths matching the pattern. - The syntax follows <code>org.apache.hadoop.fs.GlobFilter</code>. - It does not change the behavior of partition discovery. - </td> - </tr> -</table> - To read whole binary files, you need to specify the data source `format` as `binaryFile`. -For example, the following code reads all PNG files from the input directory: Review comment: Can we keep the `pathGlobFilter` option in the example? It is actually important for the use case. Just mention `pathGlobFilter` is a global option. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
