[GitHub] spark pull request: [SPARK-3228][Streaming]

tdas Fri, 29 Aug 2014 19:26:02 -0700

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/2132#issuecomment-53946301
  
    Can you please add a title to the PR. And also, this is a tricky change as 
this actually changes the user-perceived behavior of saveAsXXXFile. If someone 
has set up a system that expects a new file every batch, irrespecitve of the 
fact that it has empty data or not, then this change will break the system.
    
    This functionality can be very easily replicated in user code, by doing
    
    ```
    dstream.foreachRDD((rdd: RDD[XXX], time: Time) => {
         val fileName = prefix + time.milliseconds + suffix
         rdd.saveAsXXXFile(fileName)
    })
    ```
    
    So I am not convinced that this is a good change, especially because it 
breaks exisitng behavior.
    Any thoughts?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3228][Streaming]

Reply via email to