[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...

Dooyoung-Hwang Fri, 07 Sep 2018 03:50:07 -0700

Github user Dooyoung-Hwang commented on the issue:

    https://github.com/apache/spark/pull/22347
  
    @kiszk 
    It is impossible counting decoded rows without modify SparkPlan, because 
there is no way of counting iterated size.
    
    Instead I can simulate this patch in Scala WorkSheet with below code.
    
    ```scala
    var decodeCount = 0
    def decoding(buf: Array[Int]): Iterator[String] = {
      new Iterator[String] {
        var remain = buf.sum
        var index = 0
        override def hasNext: Boolean = remain > 0
        override def next(): String = {
          while (buf(index) == 0) index += 1
          buf(index) -= 1
          remain -= 1
          decodeCount += 1    // increase decodeCount
          f"[decode Result:$remain]"
        }
      }
    }
    
    // reset decodeCount
    decodeCount = 0
    
    // Before Patch : decode without scala view
    val buf = new ArrayBuffer[String]
    val inputIter = Array(Array(2, 2, 2), Array(2), Array(2)).iterator
    while (inputIter.hasNext) buf ++= Array(inputIter.next()).flatMap(decoding)
    val result1 = buf.take(3).toArray
    
    // ensure decode count is 10
    assert(decodeCount == 10)
    
    // reset decodeCount
    decodeCount = 0
    
    // After Patch : decode with scala view
    val result2 = ArrayBuffer(Array(2, 2, 2), Array(2), Array(2)).toArray.view
      .flatMap(decoding).take(3).force
    
    // ensure decode count is 3
    assert(decodeCount == 3)
    
    // assert same element
    assert(result1 sameElements result2)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...

Reply via email to