[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

srowen Wed, 05 Sep 2018 08:28:44 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22324#discussion_r215318249
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext with Befo
           }
         }
       }
    +
    +  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
    +    withTempPath { p =>
    +      val path = p.getAbsolutePath
    +      spark.range(1000).repartition(1).write.csv(path)
    +      val bytesReads = new mutable.ArrayBuffer[Long]()
    +      val bytesReadListener = new SparkListener() {
    +        override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    +          bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
    +        }
    +      }
    +      sparkContext.addSparkListener(bytesReadListener)
    +      try {
    +        spark.read.csv(path).limit(1).collect()
    +        sparkContext.listenerBus.waitUntilEmpty(1000L)
    +        assert(bytesReads.sum === 7860)
    --- End diff --
    
    7860/2=3930, 40 bytes more than expected, but I'm willing to believe 
there's a good reason for that somewhere in how it gets read. Clearly it's much 
better than the answer of 15640, so willing to believe this is fixing something.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

Reply via email to