[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20682
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87879/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87880/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87877/testReport)**
 for PR 20647 at commit 
[`1164eec`](https://github.com/apache/spark/commit/1164eecb55d2120bc71c059ea6944ea38d13d300).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18784
  
**[Test build #87876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87876/testReport)**
 for PR 18784 at commit 
[`2c1574e`](https://github.com/apache/spark/commit/2c1574ea5ab55f1b0af1764aaf40ba9de5824d0b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87880/testReport)**
 for PR 19222 at commit 
[`cf2d532`](https://github.com/apache/spark/commit/cf2d532ae9c8688ef314a51a89c76abe2fd5d857).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20700
  
**[Test build #87881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87881/testReport)**
 for PR 20700 at commit 
[`9fed753`](https://github.com/apache/spark/commit/9fed75338b9e3dc6e834407c177decc2f38c8743).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20682
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87876/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18784
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20682
  
**[Test build #87879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87879/testReport)**
 for PR 20682 at commit 
[`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87881/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87877/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-02 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20712
  
@cloud-fan @gatorsmile if I am not wrong, this is one of the place which is 
used not only by the driver but also by the executors, so the introduced 
parameter won't be evaluated properly, am I right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20700
  
Aha! Thanks @kiszk san for working on this! I really wanted the stateless 
methods to be extracted so that I can use more utils without having to pass 
around a `CodegenContext` for no good.

I was actually working on the exact same thing last week but got carried 
away on other tasks so my patch is still WIP.
For those curious, here's my WIP patch: 
https://github.com/rednaxelafx/apache-spark/commit/249fb93c5cacc35d53e6b0f972b139d27ef5d720

I was hoping we can cover the ones I extracted here: 
https://github.com/rednaxelafx/apache-spark/commit/249fb93c5cacc35d53e6b0f972b139d27ef5d720#diff-8bcc5aea39c73d4bf38aef6f6951d42cR1201
Your PR and my version extracted mostly the same fields/methods, which is 
great for me that I'll just help get this PR in instead of having to send mine 
out.

The approach is different, though. I'd like to have a discussion on the 
tradeoffs you had in mind when you picked your approach.
I initially did something similar, which is to extract stateless methods 
into companion object methods and change the original one to delegate to the 
companion one. But the old instance methods were a bad smell to me (passing in 
a instance when it shouldn't need the instance), so since we're at it, I 
thought why not just completely remove that bad smell once and for all.
So I deleted the original instance methods, and yes that made the diff huge 
because all use sites of those fields/methods have to be touched to switch to 
using the new version.

@kiszk WDYT? Were you intending to minimize the diff but still want to be 
able to access the stateless util methods as standalone functions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171782719
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -17,47 +17,168 @@
 
 package org.apache.spark.unsafe.memory;
 
-import javax.annotation.Nullable;
-
 import org.apache.spark.unsafe.Platform;
 
+import javax.annotation.Nullable;
+
 /**
- * A consecutive block of memory, starting at a {@link MemoryLocation} 
with a fixed size.
+ * A declaration of interfaces of MemoryBlock classes .
  */
-public class MemoryBlock extends MemoryLocation {
+public abstract class MemoryBlock {
+  @Nullable
+  protected final Object obj;
 
-  private final long length;
+  protected final long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = -1;
+  private int pageNumber = -1;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
--- End diff --

Sounds reasonable. Should we still need `MemoryLocation` to expose a 
logical memory location?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171783898
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+  dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
+  }
+
+  public static final void copyMemory(MemoryBlock src, MemoryBlock dst, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(),
+  dst.getBaseObject(), dst.getBaseOffset(), length);
+  }
+
+  public final void copyFrom(Object src, long

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171783854
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+  dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
+  }
+
+  public static final void copyMemory(MemoryBlock src, MemoryBlock dst, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(),
+  dst.getBaseObject(), dst.getBaseOffset(), length);
--- End diff --

Check if `length <= src.size()` and 

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171783186
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
--- End diff --

Should we define the behavior when `offset` is invalid here? E.g., out of 
range?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171784866
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
---
@@ -67,6 +84,14 @@ public static boolean isSet(Object baseObject, long 
baseOffset, int index) {
 return (word & mask) != 0;
   }
 
+  public static boolean isSetBlock(MemoryBlock base, long baseOffset, int 
index) {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171784381
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
---
@@ -56,6 +65,14 @@ public static void unset(Object baseObject, long 
baseOffset, int index) {
 Platform.putLong(baseObject, wordOffset, word & ~mask);
   }
 
+  public static void unsetBlock(MemoryBlock base, long baseOffset, int 
index) {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171785381
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

It is better to leave a comment why we have this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171787333
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
-assert (memory.obj == null) :
+if (memory == OffHeapMemoryBlock.NULL) return;
+assert(memory instanceof OffHeapMemoryBlock);
--- End diff --

I think this assert should be in front of `if (memory == 
OffHeapMemoryBlock.NULL) return;`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171785802
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

Can't we just call `hashUnsafeBytes` with the block's baseObject, 
baseOffset and size?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171784878
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
---
@@ -80,6 +105,16 @@ public static boolean anySet(Object baseObject, long 
baseOffset, long bitSetWidt
 return false;
   }
 
+  public static boolean anySetBlock(MemoryBlock base, long baseOffset, 
long bitSetWidthInWords) {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171784254
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -52,11 +53,16 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
* Optimized byte array equality check for byte arrays.
* @return true if the arrays are equal, false otherwise
*/
+  public static boolean arrayEqualsBlock(
--- End diff --

It should have new doc instead of using `arrayEquals`'s.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171785720
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long 
offset, int lengthInBytes,
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
--- End diff --

Can't we just call `hashUnsafeBytes2` with the `baseObject`, `baseOffset` 
and `size` of `base`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171787117
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 
0);
+
+  public OffHeapMemoryBlock(long address, long size) {
+super(null, address, size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OffHeapMemoryBlock(this.offset + offset, size);
+  }
+
+  public final int getInt(long offset) {
+return Platform.getInt(null, offset);
--- End diff --

Is this correct? Or `address + offset`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171785582
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long 
offset, int lengthInBytes,
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
+// This is compatible with original and another implementations.
+// Use this method for new components after Spark 2.3.
+long offset = base.getBaseOffset();
+int lengthInBytes = (int)base.size();
+assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative";
+int lengthAligned = lengthInBytes - lengthInBytes % 4;
+int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), 
seed);
+int k1 = 0;
+for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift 
+= 8) {
+  k1 ^= (base.getByte(offset + i) & 0xFF) << shift;
+}
+h1 ^= mixK1(k1);
+return fmix(h1, lengthInBytes);
+  }
+
+  private static int hashBytesByIntBlock(MemoryBlock base, int seed) {
--- End diff --

Can't we just call `hashBytesByInt(base.getBaseObject(), 
base.getBaseOffset(), (int)base.size())`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171784371
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
---
@@ -45,6 +46,14 @@ public static void set(Object baseObject, long 
baseOffset, int index) {
 Platform.putLong(baseObject, wordOffset, word | mask);
   }
 
+  public static void setBlock(MemoryBlock base, long baseOffset, int 
index) {
--- End diff --

It is better to add doc for it. It is not clear to know what purpose of 
this method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171783726
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+  dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
--- End diff --

Should we check if this copy is valid?

E.g., srcOffset + length > src.size() or dstOffset + length > dst.size()?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional co

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171787508
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -50,12 +52,12 @@
 
   // These are only updated by readExternal() or read()
   @Nonnull
-  private Object base;
-  private long offset;
+  private MemoryBlock base;
+  // While numBytes has the same value as base,length, to keep as int 
avoids cast from long to int
--- End diff --

nit: `base,length` -> `base, length`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171787057
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 
0);
+
+  public OffHeapMemoryBlock(long address, long size) {
+super(null, address, size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OffHeapMemoryBlock(this.offset + offset, size);
+  }
+
+  public final int getInt(long offset) {
--- End diff --

`@Override`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171790083
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala ---
@@ -108,5 +111,29 @@ trait MLTest extends StreamTest with TempDirectory { 
self: Suite =>
   otherResultCols: _*)(globalCheckFunction)
 testTransformerOnDF(dataframe, transformer, firstResultCol,
   otherResultCols: _*)(globalCheckFunction)
+}
+
+  def testTransformerByInterceptingException[A : Encoder](
+dataframe: DataFrame,
+transformer: Transformer,
+expectedMessagePart : String,
+firstResultCol: String) {
+
+def hasExpectedMessage(exception: Throwable): Boolean =
--- End diff --

I doubt whether the check here is too strict. It require exactly match 
message so when some class modify the exception message then many testcase will 
fail.
Or can we just check the exception type ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171790543
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), 
("male", "baz", 5))
 .toDF("id", "a", "b")
 val model = formula.fit(original)
+val attr = NominalAttribute.defaultAttr
 val expected = Seq(
 ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0),
 ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0),
 ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0),
 ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0)
 ).toDF("id", "a", "b", "features", "label")
-// assert(result.schema.toString == resultSchema.toString)
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

I think maybe
```
...
).toDF("id", "a", "b", "features", "label")
 .select($"id", ...
```
looks beautiful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20705
  
@dongjoon-hyun, I agree with the idea in general but just to be clear do 
you target to improve the test coverage for both when `orc` and `parquet` are 
set to `spark.sql.sources.default` here?

If not, how about we set `parquet` to `spark.sql.sources.default` 
explicitly where it's required for now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20121: [SPARK-22882][ML][TESTS] ML test for structured s...

2018-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20121#discussion_r171796773
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
 ---
@@ -75,11 +71,9 @@ class MultilayerPerceptronClassifierSuite
   .setMaxIter(100)
   .setSolver("l-bfgs")
 val model = trainer.fit(dataset)
-val result = model.transform(dataset)
 MLTestingUtils.checkCopyAndUids(trainer, model)
-val predictionAndLabels = result.select("prediction", 
"label").collect()
-predictionAndLabels.foreach { case Row(p: Double, l: Double) =>
-  assert(p == l)
+testTransformer[(Vector, Double)](dataset.toDF(), model, "prediction", 
"label") {
--- End diff --

The type of `dataset` is `Dataset[_]` ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20121: [SPARK-22882][ML][TESTS] ML test for structured s...

2018-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20121#discussion_r171796232
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala
 ---
@@ -169,59 +171,28 @@ class GBTClassifierSuite extends SparkFunSuite with 
MLlibTestSparkContext
 val blas = BLAS.getInstance()
 
 val validationDataset = validationData.toDF(labelCol, featuresCol)
-val results = gbtModel.transform(validationDataset)
-// check that raw prediction is tree predictions dot tree weights
-results.select(rawPredictionCol, featuresCol).collect().foreach {
-  case Row(raw: Vector, features: Vector) =>
+testTransformer[(Double, Vector)](validationDataset, gbtModel,
+  "rawPrediction", "features", "probability", "prediction") {
+  case Row(raw: Vector, features: Vector, prob: Vector, pred: Double) 
=>
 assert(raw.size === 2)
+// check that raw prediction is tree predictions dot tree weights
 val treePredictions = 
gbtModel.trees.map(_.rootNode.predictImpl(features).prediction)
 val prediction = blas.ddot(gbtModel.numTrees, treePredictions, 1, 
gbtModel.treeWeights, 1)
 assert(raw ~== Vectors.dense(-prediction, prediction) relTol eps)
-}
 
-// Compare rawPrediction with probability
-results.select(rawPredictionCol, probabilityCol).collect().foreach {
-  case Row(raw: Vector, prob: Vector) =>
-assert(raw.size === 2)
+// Compare rawPrediction with probability
 assert(prob.size === 2)
 // Note: we should check other loss types for classification if 
they are added
 val predFromRaw = raw.toDense.values.map(value => 
LogLoss.computeProbability(value))
 assert(prob(0) ~== predFromRaw(0) relTol eps)
 assert(prob(1) ~== predFromRaw(1) relTol eps)
 assert(prob(0) + prob(1) ~== 1.0 absTol absEps)
-}
 
-// Compare prediction with probability
-results.select(predictionCol, probabilityCol).collect().foreach {
-  case Row(pred: Double, prob: Vector) =>
+// Compare prediction with probability
 val predFromProb = prob.toArray.zipWithIndex.maxBy(_._1)._2
 assert(pred == predFromProb)
 }
 
-// force it to use raw2prediction
-gbtModel.setRawPredictionCol(rawPredictionCol).setProbabilityCol("")
-val resultsUsingRaw2Predict =
-  
gbtModel.transform(validationDataset).select(predictionCol).as[Double].collect()
-
resultsUsingRaw2Predict.zip(results.select(predictionCol).as[Double].collect()).foreach
 {
-  case (pred1, pred2) => assert(pred1 === pred2)
-}
-
-// force it to use probability2prediction
-gbtModel.setRawPredictionCol("").setProbabilityCol(probabilityCol)
-val resultsUsingProb2Predict =
-  
gbtModel.transform(validationDataset).select(predictionCol).as[Double].collect()
-
resultsUsingProb2Predict.zip(results.select(predictionCol).as[Double].collect()).foreach
 {
-  case (pred1, pred2) => assert(pred1 === pred2)
-}
-
-// force it to use predict
--- End diff --

These testing code path has been covered by 
`ProbabilisticClassifierSuite.testPredictMethods`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171801055
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
--- End diff --

I see. I will put a comment since these operations do not guarantee 
behavior if `offset` is invalid.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171801720
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+  dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
--- End diff --

The condition depends on whether `MemoryBlock` is `OffHeapMemoryBlock`. 
Since the condition of this check is not simple, I gave up adding assertion due 
to performance.


---

-
To unsubscribe, e-mail: reviews

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171801856
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,136 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Just instantiate the same type of MemoryBlock with new offset and 
size. The data is not
+   * copied.
+   */
+  public abstract MemoryBlock subBlock(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+  dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
+  }
+
+  public static final void copyMemory(MemoryBlock src, MemoryBlock dst, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(),
+  dst.getBaseObject(), dst.getBaseOffset(), length);
--- End diff --

This assertion can be reasonably simp

[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20682
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20682: [SPARK-23522][Python] always use sys.exit over bu...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20682#discussion_r171803941
  
--- Diff: python/pyspark/ml/evaluation.py ---
@@ -15,6 +15,8 @@
 # limitations under the License.
 #
 
+import sys
+
--- End diff --

Let's remove newline here. I think `abc` is  is builtin one too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20682
  
**[Test build #87882 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87882/testReport)**
 for PR 20682 at commit 
[`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171805189
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
---
@@ -45,6 +46,14 @@ public static void set(Object baseObject, long 
baseOffset, int index) {
 Platform.putLong(baseObject, wordOffset, word | mask);
   }
 
+  public static void setBlock(MemoryBlock base, long baseOffset, int 
index) {
--- End diff --

Let me drop this since it is not used for now. It is provided for future 
use.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20682
  
lgtm otherwise


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-03-02 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20343
  
We need to review #20433 first


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-02 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
@gatorsmile ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

2018-03-02 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20692
  
thanks @gatorsmile and @rdblue for your discussion and thanks @gatorsmile 
for trusting me. I hope I will be worthy of your trust.

I agree that since we are not close to a new release, having a throughout 
approach is better.

Here is my analysis. Let me know if something else is needed. Basically, no 
other DB shows data type for columns. I won't include in this description the 
output for the various DBs, since for DB2 it is particularly verbose.
Anyway, I confirm that DB2's output is what is shown in the article posted 
by @gatorsmile.
Postgres's examples can be found here: 
https://www.postgresql.org/docs/10/static/using-explain.html.
Oracle's ones here: 
https://docs.oracle.com/cd/B10501_01/server.920/a96540/statements_911a.htm#2061798.
MySQL is not very different too, here it is a JSON representation of an 
example:
```
{
  "query_block": {
"select_id": 1,
"nested_loop": [
  {
"table": {
  "table_name": "u",
  "access_type": "ALL",
  "possible_keys": [
"PRIMARY"
  ],
  "rows": 4,
  "filtered": 100
}
  },
  {
"table": {
  "table_name": "v",
  "access_type": "ref",
  "possible_keys": [
"PRIMARY"
  ],
  "key": "PRIMARY",
  "used_key_parts": [
"username"
  ],
  "key_length": "258",
  "ref": [
"DBNAME.u.username"
  ],
  "rows": 1,
  "filtered": 100,
  "using_index": true
}
  }
]
  }
}
```
Here is SQLServer output (quite similar to the others too):
```
StmtText
   StmtId   
   NodeId  Parent  PhysicalOp LogicalOp 
 Argument   
   DefinedValues
 EstimateRows   EstimateIO EstimateCPUAvgRowSize  
TotalSubtreeCost OutputList 
 Warnings Type  
   Parallel EstimateExecutions

--
 --- --- --- -- 
-- 
-
 - -- 
-- -- ---  
---
   
 --
select * from t1 join t2 on t1.a=t2.a;  

 1   1   0 NULL   NULL  
 1  
   NULL 
1.0   NULL   NULLNULL 
2.4343444E-2 NULL   
 NULL SELECT
  0   NULL
  |--Hash Match(Inner Join, 
HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), 
RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a]))1
   2   1 Hash Match Inner Join 
HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), 
RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a])  NULL
 1.00.0   1.7774245E-2  
23 2.4343444E-2 [master].[dbo].[t1].[a], [master].[dbo].[t1].[b], 
[master].[dbo].[t2].[a], [master].[dbo].[t2].[b]  NULL PLAN_ROW 
   01.0
   |--Table Scan(OBJECT:([master].[dbo].[t2]))  

 1   3   2 Table Scan Table Scan
 OBJEC

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171815998
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 
0);
+
+  public OffHeapMemoryBlock(long address, long size) {
+super(null, address, size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OffHeapMemoryBlock(this.offset + offset, size);
+  }
+
+  public final int getInt(long offset) {
+return Platform.getInt(null, offset);
--- End diff --

This is correct for `OffHeapMemoryBlock`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171820330
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

While not to call `hashUnsafeBytes`  can achieve better performance, I will 
call `hashUnsafeBytes ` until we will remove `hashUnsafeBytes`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171821064
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long 
offset, int lengthInBytes,
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
+// This is compatible with original and another implementations.
+// Use this method for new components after Spark 2.3.
+long offset = base.getBaseOffset();
+int lengthInBytes = (int)base.size();
+assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative";
+int lengthAligned = lengthInBytes - lengthInBytes % 4;
+int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), 
seed);
+int k1 = 0;
+for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift 
+= 8) {
+  k1 ^= (base.getByte(offset + i) & 0xFF) << shift;
+}
+h1 ^= mixK1(k1);
+return fmix(h1, lengthInBytes);
+  }
+
+  private static int hashBytesByIntBlock(MemoryBlock base, int seed) {
--- End diff --

We can do without considering performance.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171820928
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

Do we need to leave a comment at every method with `...ByteBlock(`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171821706
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long 
offset, int lengthInBytes,
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171821972
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long 
offset, int lengthInBytes,
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) {
+// This is compatible with original and another implementations.
+// Use this method for new components after Spark 2.3.
+long offset = base.getBaseOffset();
+int lengthInBytes = (int)base.size();
+assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative";
+int lengthAligned = lengthInBytes - lengthInBytes % 4;
+int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), 
seed);
+int k1 = 0;
+for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift 
+= 8) {
+  k1 ^= (base.getByte(offset + i) & 0xFF) << shift;
+}
+h1 ^= mixK1(k1);
+return fmix(h1, lengthInBytes);
+  }
+
+  private static int hashBytesByIntBlock(MemoryBlock base, int seed) {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171822360
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

I am considering to change to call `hashUnsafeBytesBlock()` from 
`hashUnsafeBytes()`. It can also reduce the duplication. Does it make sense?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171822744
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
-assert (memory.obj == null) :
+if (memory == OffHeapMemoryBlock.NULL) return;
+assert(memory instanceof OffHeapMemoryBlock);
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20618
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20121
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1225/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20121
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20121
  
@smurakozi Address your comments. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20618
  
**[Test build #87883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87883/testReport)**
 for PR 20618 at commit 
[`10afda5`](https://github.com/apache/spark/commit/10afda5d97146af7e02cc3b18968f89f074e093c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20121
  
**[Test build #87884 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87884/testReport)**
 for PR 20121 at commit 
[`6d59c5b`](https://github.com/apache/spark/commit/6d59c5b017594e6b6d0b7c937207414fc942e448).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171829796
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -173,16 +172,26 @@ def _():
 
 _functions_2_1 = {
 # unary math functions
-'degrees': 'Converts an angle measured in radians to an approximately 
equivalent angle ' +
-   'measured in degrees.',
-'radians': 'Converts an angle measured in degrees to an approximately 
equivalent angle ' +
-   'measured in radians.',
+'degrees': """Converts an angle measured in radians to an 
approximately equivalent angle
+   measured in degrees.
--- End diff --

Let's add a newline here. Seems doc could be broken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171829946
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -173,16 +172,26 @@ def _():
 
 _functions_2_1 = {
 # unary math functions
-'degrees': 'Converts an angle measured in radians to an approximately 
equivalent angle ' +
-   'measured in degrees.',
-'radians': 'Converts an angle measured in degrees to an approximately 
equivalent angle ' +
-   'measured in radians.',
+'degrees': """Converts an angle measured in radians to an 
approximately equivalent angle
+   measured in degrees.
+   :param col: angle in radians
+   :return: angle in degrees, as if computed by 
`java.lang.Math.toDegrees()`""",
+'radians': """Converts an angle measured in degrees to an 
approximately equivalent angle measured in radians.
--- End diff --

Seems this hits the line length limit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171834517
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -50,12 +52,12 @@
 
   // These are only updated by readExternal() or read()
   @Nonnull
-  private Object base;
-  private long offset;
+  private MemoryBlock base;
+  // While numBytes has the same value as base,length, to keep as int 
avoids cast from long to int
--- End diff --

thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171834629
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -52,11 +53,16 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
* Optimized byte array equality check for byte arrays.
* @return true if the arrays are equal, false otherwise
*/
+  public static boolean arrayEqualsBlock(
--- End diff --

yeah


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171834617
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1500,31 +1534,35 @@ object functions {
   }
 
   /**
-   * Computes the cosine of the given value. Units in radians.
+   * @param e angle in radians
+   * @return cosine of the angle, as if computed by [[java.lang.Math#cos]]
--- End diff --

Seems this link is broken in Scaladoc. Let's just use `` 
`java.lang.Math.cos` `` in this file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171834562
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 
0);
+
+  public OffHeapMemoryBlock(long address, long size) {
+super(null, address, size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OffHeapMemoryBlock(this.offset + offset, size);
+  }
+
+  public final int getInt(long offset) {
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20618#discussion_r171836519
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1313,131 +1313,165 @@ object functions {
   
//
 
   /**
-   * Computes the cosine inverse of the given value; the returned angle is 
in the range
-   * 0.0 through pi.
+   * @return inverse cosine of `e` in radians, as if computed by 
[[java.lang.Math#acos]]
*
* @group math_funcs
* @since 1.4.0
*/
   def acos(e: Column): Column = withExpr { Acos(e.expr) }
 
   /**
-   * Computes the cosine inverse of the given column; the returned angle 
is in the range
-   * 0.0 through pi.
+   * @return inverse cosine of `columnName`, as if computed by 
[[java.lang.Math#acos]]
*
* @group math_funcs
* @since 1.4.0
*/
   def acos(columnName: String): Column = acos(Column(columnName))
 
   /**
-   * Computes the sine inverse of the given value; the returned angle is 
in the range
-   * -pi/2 through pi/2.
+   * @return inverse sine of `e` in radians, as if computed by 
[[java.lang.Math#asin]]
*
* @group math_funcs
* @since 1.4.0
*/
   def asin(e: Column): Column = withExpr { Asin(e.expr) }
 
   /**
-   * Computes the sine inverse of the given column; the returned angle is 
in the range
-   * -pi/2 through pi/2.
+   * @return inverse sine of `columnName`, as if computed by 
[[java.lang.Math#asin]]
*
* @group math_funcs
* @since 1.4.0
*/
   def asin(columnName: String): Column = asin(Column(columnName))
 
   /**
-   * Computes the tangent inverse of the given column; the returned angle 
is in the range
-   * -pi/2 through pi/2
+   * @return inverse tangent of `e`, as if computed by 
[[java.lang.Math#atan]]
*
* @group math_funcs
* @since 1.4.0
*/
   def atan(e: Column): Column = withExpr { Atan(e.expr) }
 
   /**
-   * Computes the tangent inverse of the given column; the returned angle 
is in the range
-   * -pi/2 through pi/2
+   * @return inverse tangent of `columnName`, as if computed by 
[[java.lang.Math#atan]]
*
* @group math_funcs
* @since 1.4.0
*/
   def atan(columnName: String): Column = atan(Column(columnName))
 
   /**
-   * Returns the angle theta from the conversion of rectangular 
coordinates (x, y) to
-   * polar coordinates (r, theta). Units in radians.
+   * @param y coordinate on y-axis
+   * @param x coordinate on x-axis
+   * @return the theta component of the point
+   * (r, theta)
+   * in polar coordinates that corresponds to the point
+   * (x, y) in Cartesian coordinates,
+   * as if computed by [[java.lang.Math#atan2]]
*
* @group math_funcs
* @since 1.4.0
*/
-  def atan2(l: Column, r: Column): Column = withExpr { Atan2(l.expr, 
r.expr) }
+  def atan2(y: Column, x: Column): Column = withExpr { Atan2(y.expr, 
x.expr) }
 
   /**
-   * Returns the angle theta from the conversion of rectangular 
coordinates (x, y) to
-   * polar coordinates (r, theta).
+   * @param y coordinate on y-axis
+   * @param xName coordinate on x-axis
+   * @return the theta component of the point
+   * (r, theta)
+   * in polar coordinates that corresponds to the point
+   * (x, y) in Cartesian coordinates,
+   * as if computed by [[java.lang.Math#atan2]]
*
* @group math_funcs
* @since 1.4.0
*/
-  def atan2(l: Column, rightName: String): Column = atan2(l, 
Column(rightName))
+  def atan2(y: Column, xName: String): Column = atan2(y, Column(xName))
 
   /**
-   * Returns the angle theta from the conversion of rectangular 
coordinates (x, y) to
-   * polar coordinates (r, theta).
+   * @param yName coordinate on y-axis
+   * @param x coordinate on x-axis
+   * @return the theta component of the point
+   * (r, theta)
+   * in polar coordinates that corresponds to the point
+   * (x, y) in Cartesian coordinates,
+   * as if computed by [[java.lang.Math#atan2]]
*
* @group math_funcs
* @since 1.4.0
*/
-  def atan2(leftName: String, r: Column): Column = atan2(Column(leftName), 
r)
+  def atan2(yName: String, x: Column): Column = atan2(Column(yName), x)
 
   /**
-   * Returns the angle theta from the conversion of rectangular 
coordinates (x, y) to
-   * polar coordinates (r, theta).
+   * @param yName coordinate on y-axis
+   

[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-02 Thread sandecho
Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
Can you please test it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171839162
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 
0);
+
+  public OffHeapMemoryBlock(long address, long size) {
+super(null, address, size);
+  }
+
+  @Override
+  public MemoryBlock subBlock(long offset, long size) {
+return new OffHeapMemoryBlock(this.offset + offset, size);
+  }
+
+  public final int getInt(long offset) {
+return Platform.getInt(null, offset);
--- End diff --

You mean here `offset` is absolute address?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171839479
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

Seems they are somehow duplicate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20121
  
**[Test build #87884 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87884/testReport)**
 for PR 20121 at commit 
[`6d59c5b`](https://github.com/apache/spark/commit/6d59c5b017594e6b6d0b7c937207414fc942e448).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20121
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20121
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87884/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1226/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87885 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87885/testReport)**
 for PR 19222 at commit 
[`eb0cc6d`](https://github.com/apache/spark/commit/eb0cc6d2ddb3fc648e1d449dad6e53abbabcf2fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-02 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r171842603
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
@@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, 
int lengthInBytes) {
 return hashUnsafeWords(base, offset, lengthInBytes, seed);
   }
 
+  public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, 
int lengthInBytes, int seed) {
+return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, 
seed);
+  }
+
   public static int hashUnsafeWords(Object base, long offset, int 
lengthInBytes, int seed) {
 // This is based on Guava's 
`Murmur32_Hasher.processRemaining(ByteBuffer)` method.
 assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 
8 (word-aligned)";
 int h1 = hashBytesByInt(base, offset, lengthInBytes, seed);
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
+long offset = base.getBaseOffset();
--- End diff --

Sure, to eliminate duplication, we will call `hashUnsafeBytesBlock()` from 
`hashUnsafeBytes()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-02 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20712
  
@mgaido91 Yeah, you are right. This is pretty pointless unless you either 
set this as a static spark conf or pass it to the executor using local 
properties. Both aren't very attractive IMO. @passionke why do we need this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20712
  
+1, I would like to know the case where we increased cache hit ratio.
Do we reuse the same plan beyond more than 100 plans?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20682
  
**[Test build #87882 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87882/testReport)**
 for PR 20682 at commit 
[`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87882/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20682
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20700
  
@rednaxelafx Oh, very interesting since we are doing the similar thing in 
West coast and Japan!

I just say `not refactored YET`. Yeah, I absolutely love to delete old 
instance method if we totally agree. Is it OK to delete all of old method in 
this PR?
`Some methods that very frequently used (e.g. javaType()) are not 
refactored yet.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...

2018-03-02 Thread animenon
GitHub user animenon opened a pull request:

https://github.com/apache/spark/pull/20716

[SPARK-23566][Minor][Doc] Argument name mismatch fixed

Argument name mismatch fixed.

## What changes were proposed in this pull request?

`col` changed to `new` in doc string to match the argument list.

Patch file added: https://issues.apache.org/jira/browse/SPARK-23566

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/animenon/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20716


commit 5d2bb9d0b480d7ac2a2c2d6364b87789f57856e8
Author: Anirudh 
Date:   2018-03-02T13:58:27Z

[SPARK-23566][Minor][Doc] Argument name mismatch fixed

Argument name mismatch fixed.
`col` changed to `new` in doc string to match the argument list.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20716
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1227/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20700
  
**[Test build #87886 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87886/testReport)**
 for PR 20700 at commit 
[`f589a2a`](https://github.com/apache/spark/commit/f589a2add87bc40db0cce845873571717c631c59).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-02 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20639
  
Executed these tests manually again but working fine. Seems like flaky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1228/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20700
  
**[Test build #87887 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87887/testReport)**
 for PR 20700 at commit 
[`ebce1f2`](https://github.com/apache/spark/commit/ebce1f20df20d17bb4ad54faba8d597f7a02b635).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20716
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20716
  
It's fine. Can you double check if there are same instances in this file or 
other files?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >