Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14731#discussion_r83490975
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -644,17 +644,90 @@ methods for creating DStreams from files as input 
sources.
         </div>
         </div>
     
    -   Spark Streaming will monitor the directory `dataDirectory` and process 
any files created in that directory (files written in nested directories not 
supported). Note that
    +   Spark Streaming will monitor the directory `dataDirectory` and process 
any files created in that directory.
    +
    +     ++ The files must have the same data format.
    +     + A simple directory can be monitored, such as 
`hdfs://namenode:8040/logs/`.
    +       All files directly such a path will be processed as they are 
discovered.
    +     + A POSIX glob pattern can be supplied, such as
    +       `hdfs://namenode:8040/logs/2016-??-31`.
    +       Here, the DStream will consist of all files directly under those 
directories
    +       matching the regular expression.
    --- End diff --
    
    I think this comment is still incorrect. A glob is not a regex. If this is 
just the syntax other Hadoop APIs support, that seems reasonable, but it should 
be described that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to