[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20636


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-07-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r206748015
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
--- End diff --

I think we support this for debugging purpose since, IIRC, that's going to 
make separate processes for workers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-07-31 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r206638221
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
+  "--driver-memory", "7g",
--- End diff --

Hm, just wondering if it's going to be problematic that the test now spawns 
a job that needs more than 7G of RAM? maybe I misunderstand.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-07-31 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r206638094
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -55,22 +55,30 @@ object BufferHolderSparkSubmitSuite {
 
 val ARRAY_MAX = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
 
-val holder = new BufferHolder(new UnsafeRow(1000))
+val unsafeRow = new UnsafeRow(1000)
+val holder = new BufferHolder(unsafeRow)
 
 holder.reset()
-holder.grow(roundToWord(ARRAY_MAX / 2))
 
-holder.reset()
-holder.grow(roundToWord(ARRAY_MAX / 2 + 8))
+// while to reuse a buffer may happen, this test checks whether the 
buffer can be grown
+holder.grow(ARRAY_MAX / 2)
+assert(unsafeRow.getSizeInBytes % 8 == 0)
 
-holder.reset()
-holder.grow(roundToWord(Integer.MAX_VALUE / 2))
+holder.grow(ARRAY_MAX / 2 + 7)
+assert(unsafeRow.getSizeInBytes % 8 == 0)
 
-holder.reset()
-holder.grow(roundToWord(Integer.MAX_VALUE))
-  }
+holder.grow(Integer.MAX_VALUE / 2)
+assert(unsafeRow.getSizeInBytes % 8 == 0)
+
+holder.grow(ARRAY_MAX - holder.totalSize())
+assert(unsafeRow.getSizeInBytes % 8 == 0)
 
-  private def roundToWord(len: Int): Int = {
-ByteArrayMethods.roundNumberOfBytesToNearestWord(len)
+try {
+  holder.grow(ARRAY_MAX + 1 - holder.totalSize())
+  assert(false)
+} catch {
+case _: UnsupportedOperationException => assert(true)
--- End diff --

Fix the indents here. `assert(true)` is a no-op, so just omit it. 
`assert(false)` is less useful than `fail(...message...)`, above. Let an 
unexpected `Throwable` just fly out of the method to fail it rather than 
swallow it. But do you really just want to use `intercept` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r195111027
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -61,12 +61,15 @@
* Grows the buffer by at least neededSize and points the row to the 
buffer.
*/
   void grow(int neededSize) {
-if (neededSize > ARRAY_MAX - totalSize()) {
+assert neededSize < 0 :
+  "Cannot grow BufferHolder by size " + neededSize + " because the 
size is negative";
+int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize);
--- End diff --

and we should add a comment to the `buffer` field to say that, it's 
guaranteed to be word-aligned, and explain who may assume it's word-aligned.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r195110654
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -61,12 +61,15 @@
* Grows the buffer by at least neededSize and points the row to the 
buffer.
*/
   void grow(int neededSize) {
-if (neededSize > ARRAY_MAX - totalSize()) {
+assert neededSize < 0 :
+  "Cannot grow BufferHolder by size " + neededSize + " because the 
size is negative";
+int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize);
+if (roundedSize > ARRAY_MAX - totalSize()) {
   throw new UnsupportedOperationException(
-"Cannot grow BufferHolder by size " + neededSize + " because the 
size after growing " +
+"Cannot grow BufferHolder by size " + roundedSize + " because the 
size after growing " +
   "exceeds size limitation " + ARRAY_MAX);
 }
-final int length = totalSize() + neededSize;
+final int length = totalSize() + roundedSize;
--- End diff --

we only need to make `length * 2` word-aligned.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r195108390
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -61,12 +61,15 @@
* Grows the buffer by at least neededSize and points the row to the 
buffer.
*/
   void grow(int neededSize) {
-if (neededSize > ARRAY_MAX - totalSize()) {
+assert neededSize < 0 :
+  "Cannot grow BufferHolder by size " + neededSize + " because the 
size is negative";
+int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize);
--- End diff --

shall we do the same thing in L55?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r195107427
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -61,12 +61,15 @@
* Grows the buffer by at least neededSize and points the row to the 
buffer.
*/
   void grow(int neededSize) {
-if (neededSize > ARRAY_MAX - totalSize()) {
+assert neededSize < 0 :
+  "Cannot grow BufferHolder by size " + neededSize + " because the 
size is negative";
+int roundedSize = 
ByteArrayMethods.roundNumberOfBytesToNearestWord(neededSize);
--- End diff --

now I recall why we don't need this. We always grow to double of the 
previous size, or the max size. So as long as the initial size is word aligned, 
and the max size is word aligned, we are fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r194526656
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -61,6 +61,10 @@
* Grows the buffer by at least neededSize and points the row to the 
buffer.
*/
   void grow(int neededSize) {
+if (neededSize < 0) {
--- End diff --

I think we can use `assert` here. When `neededSize` is negative, there must 
be an overflow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-04-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r182025819
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
--- End diff --

ping @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-04-10 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r180528504
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
--- End diff --

Good question. Several tests still seem to use `local-cluster`. Is it 
better to use `local` while it may require more memory?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-04-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r180507408
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -39,8 +39,8 @@ class BufferHolderSparkSubmitSuite
 val argsForSparkSubmit = Seq(
   "--class", 
BufferHolderSparkSubmitSuite.getClass.getName.stripSuffix("$"),
   "--name", "SPARK-2",
-  "--master", "local-cluster[2,1,1024]",
-  "--driver-memory", "4g",
+  "--master", "local-cluster[1,1,7168]",
--- End diff --

Do we still support this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-02-25 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r170482768
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -58,15 +58,20 @@ object BufferHolderSparkSubmitSuite {
 val holder = new BufferHolder(new UnsafeRow(1000))
 
 holder.reset()
+// execute here since reset() updates holder.cursor
+val smallBuffer = new Array[Byte](holder.cursor)
--- End diff --

ping @liufengdb and @gatorsmile


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-02-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20636#discussion_r169151430
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSparkSubmitSuite.scala
 ---
@@ -58,15 +58,20 @@ object BufferHolderSparkSubmitSuite {
 val holder = new BufferHolder(new UnsafeRow(1000))
 
 holder.reset()
+// execute here since reset() updates holder.cursor
+val smallBuffer = new Array[Byte](holder.cursor)
--- End diff --

Ping, @liufengdb and @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferH...

2018-02-19 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/20636

[SPARK-23415][SQL][TEST] Make behavior of BufferHolderSparkSubmitSuite 
correct and stable

## What changes were proposed in this pull request?

This PR addresses two issues in `BufferHolderSparkSubmitSuite`.

1. While `BufferHolderSparkSubmitSuite` tried to allocate a large object 
several times, it actually allocated an object once and reused the object.
2. `BufferHolderSparkSubmitSuite` may fail due to timeout 

To assign a small object before allocating a large object each time solved 
issue 1 by avoiding reuse.
To increasing heap size from 4g to 7g **probably** solved issue 2. It can 
also avoid OOM after fixing issue 1.

## How was this patch tested?

Updated existing `BufferHolderSparkSubmitSuite`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-23415

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20636


commit 39a715c52ac4055b3a54e221d692039b8ac20e97
Author: Kazuaki Ishizaki 
Date:   2018-02-19T08:55:11Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org