[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

guoxiaolongzte Wed, 31 Jan 2018 19:25:56 -0800

Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20437#discussion_r165251810
  
    --- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
    @@ -157,7 +157,7 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
         val metadata = Map(
           "files" -> newFiles.toList,
           StreamInputInfo.METADATA_KEY_DESCRIPTION -> newFiles.mkString("\n"))
    -    val inputInfo = StreamInputInfo(id, 0, metadata)
    +    val inputInfo = StreamInputInfo(id, rdds.map(_.count).sum, metadata)
    --- End diff --
    
    If you can add a switch parameter, the default value is false.
    
    If it is true, then it needs to be count (read the file again) so that the 
records can be correctly counted. Of course, it shows that when the parameter 
is opened to true, the streaming performance problem will be affected.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

Reply via email to