[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20636 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r206748015 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", --- End diff -- I think we support this for debugging purpose since, IIRC, that's going to make separate processes for workers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r206638221 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", + "--driver-memory", "7g", --- End diff -- Hm, just wondering if it's going to be problematic that the test now spawns a job that needs more than 7G of RAM? maybe I misunderstand. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r206638094 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -55,22 +55,30 @@ object BufferHolderSparkSubmitSuite { val ARRAY_MAX = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH -val holder = new BufferHolder(new UnsafeRow(1000)) +val unsafeRow = new UnsafeRow(1000) +val holder = new BufferHolder(unsafeRow) holder.reset() -holder.grow(roundToWord(ARRAY_MAX / 2)) -holder.reset() -holder.grow(roundToWord(ARRAY_MAX / 2 + 8)) +// while to reuse a buffer may happen, this test checks whether the buffer can be grown +holder.grow(ARRAY_MAX / 2) +assert(unsafeRow.getSizeInBytes % 8 == 0) -holder.reset() -holder.grow(roundToWord(Integer.MAX_VALUE / 2)) +holder.grow(ARRAY_MAX / 2 + 7) +assert(unsafeRow.getSizeInBytes % 8 == 0) -holder.reset() -holder.grow(roundToWord(Integer.MAX_VALUE)) - } +holder.grow(Integer.MAX_VALUE / 2) +assert(unsafeRow.getSizeInBytes % 8 == 0) + +holder.grow(ARRAY_MAX - holder.totalSize()) +assert(unsafeRow.getSizeInBytes % 8 == 0) - private def roundToWord(len: Int): Int = { -ByteArrayMethods.roundNumberOfBytesToNearestWord(len) +try { + holder.grow(ARRAY_MAX + 1 - holder.totalSize()) + assert(false) +} catch { +case _: UnsupportedOperationException => assert(true) --- End diff -- Fix the indents here. `assert(true)` is a no-op, so just omit it. `assert(false)` is less useful than `fail(...message...)`, above. Let an unexpected `Throwable` just fly out of the method to fail it rather than swallow it. But do you really just want to use `intercept` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r195111027 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -61,12 +61,15 @@ * Grows the buffer by at least neededSize and points the row to the buffer. */ void grow(int neededSize) { -if (neededSize > ARRAY_MAX - totalSize()) { +assert neededSize < 0 : + "Cannot grow BufferHolder by size " + neededSize + " because the size is negative"; +int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize); --- End diff -- and we should add a comment to the `buffer` field to say that, it's guaranteed to be word-aligned, and explain who may assume it's word-aligned. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r195110654 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -61,12 +61,15 @@ * Grows the buffer by at least neededSize and points the row to the buffer. */ void grow(int neededSize) { -if (neededSize > ARRAY_MAX - totalSize()) { +assert neededSize < 0 : + "Cannot grow BufferHolder by size " + neededSize + " because the size is negative"; +int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize); +if (roundedSize > ARRAY_MAX - totalSize()) { throw new UnsupportedOperationException( -"Cannot grow BufferHolder by size " + neededSize + " because the size after growing " + +"Cannot grow BufferHolder by size " + roundedSize + " because the size after growing " + "exceeds size limitation " + ARRAY_MAX); } -final int length = totalSize() + neededSize; +final int length = totalSize() + roundedSize; --- End diff -- we only need to make `length * 2` word-aligned. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r195108390 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -61,12 +61,15 @@ * Grows the buffer by at least neededSize and points the row to the buffer. */ void grow(int neededSize) { -if (neededSize > ARRAY_MAX - totalSize()) { +assert neededSize < 0 : + "Cannot grow BufferHolder by size " + neededSize + " because the size is negative"; +int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize); --- End diff -- shall we do the same thing in L55? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r195107427 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -61,12 +61,15 @@ * Grows the buffer by at least neededSize and points the row to the buffer. */ void grow(int neededSize) { -if (neededSize > ARRAY_MAX - totalSize()) { +assert neededSize < 0 : + "Cannot grow BufferHolder by size " + neededSize + " because the size is negative"; +int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize); --- End diff -- now I recall why we don't need this. We always grow to double of the previous size, or the max size. So as long as the initial size is word aligned, and the max size is word aligned, we are fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r194526656 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -61,6 +61,10 @@ * Grows the buffer by at least neededSize and points the row to the buffer. */ void grow(int neededSize) { +if (neededSize < 0) { --- End diff -- I think we can use `assert` here. When `neededSize` is negative, there must be an overflow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r182025819 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", --- End diff -- ping @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r180528504 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", --- End diff -- Good question. Several tests still seem to use `local-cluster`. Is it better to use `local` while it may require more memory? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r180507408 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite val argsForSparkSubmit = Seq( "--class", BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"), "--name", "SPARK-2", - "--master", "local-cluster[2,1,1024]", - "--driver-memory", "4g", + "--master", "local-cluster[1,1,7168]", --- End diff -- Do we still support this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r170482768 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -58,15 +58,20 @@ object BufferHolderSparkSubmitSuite { val holder = new BufferHolder(new UnsafeRow(1000)) holder.reset() +// execute here since reset() updates holder.cursor +val smallBuffer = new Array[Byte](holder.cursor) --- End diff -- ping @liufengdb and @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20636#discussion_r169151430 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala --- @@ -58,15 +58,20 @@ object BufferHolderSparkSubmitSuite { val holder = new BufferHolder(new UnsafeRow(1000)) holder.reset() +// execute here since reset() updates holder.cursor +val smallBuffer = new Array[Byte](holder.cursor) --- End diff -- Ping, @liufengdb and @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/20636 [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSparkSubmitSuite correct and stable ## What changes were proposed in this pull request? This PR addresses two issues in `BufferHolderSparkSubmitSuite`. 1. While `BufferHolderSparkSubmitSuite` tried to allocate a large object several times, it actually allocated an object once and reused the object. 2. `BufferHolderSparkSubmitSuite` may fail due to timeout To assign a small object before allocating a large object each time solved issue 1 by avoiding reuse. To increasing heap size from 4g to 7g **probably** solved issue 2. It can also avoid OOM after fixing issue 1. ## How was this patch tested? Updated existing `BufferHolderSparkSubmitSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-23415 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20636.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20636 commit 39a715c52ac4055b3a54e221d692039b8ac20e97 Author: Kazuaki Ishizaki Date: 2018-02-19T08:55:11Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org