[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87879/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87880/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20647 **[Test build #87877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87877/testReport)** for PR 20647 at commit [`1164eec`](https://github.com/apache/spark/commit/1164eecb55d2120bc71c059ea6944ea38d13d300). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18784 **[Test build #87876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87876/testReport)** for PR 18784 at commit [`2c1574e`](https://github.com/apache/spark/commit/2c1574ea5ab55f1b0af1764aaf40ba9de5824d0b). * This patch **fails due to an unknown error code, -9**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87880/testReport)** for PR 19222 at commit [`cf2d532`](https://github.com/apache/spark/commit/cf2d532ae9c8688ef314a51a89c76abe2fd5d857). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20700 **[Test build #87881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87881/testReport)** for PR 20700 at commit [`9fed753`](https://github.com/apache/spark/commit/9fed75338b9e3dc6e834407c177decc2f38c8743). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18784 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87876/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18784 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #87879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87879/testReport)** for PR 20682 at commit [`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87881/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20647 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87877/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20647 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20712 @cloud-fan @gatorsmile if I am not wrong, this is one of the place which is used not only by the driver but also by the executors, so the introduced parameter won't be evaluated properly, am I right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20700 Aha! Thanks @kiszk san for working on this! I really wanted the stateless methods to be extracted so that I can use more utils without having to pass around a `CodegenContext` for no good. I was actually working on the exact same thing last week but got carried away on other tasks so my patch is still WIP. For those curious, here's my WIP patch: https://github.com/rednaxelafx/apache-spark/commit/249fb93c5cacc35d53e6b0f972b139d27ef5d720 I was hoping we can cover the ones I extracted here: https://github.com/rednaxelafx/apache-spark/commit/249fb93c5cacc35d53e6b0f972b139d27ef5d720#diff-8bcc5aea39c73d4bf38aef6f6951d42cR1201 Your PR and my version extracted mostly the same fields/methods, which is great for me that I'll just help get this PR in instead of having to send mine out. The approach is different, though. I'd like to have a discussion on the tradeoffs you had in mind when you picked your approach. I initially did something similar, which is to extract stateless methods into companion object methods and change the original one to delegate to the companion one. But the old instance methods were a bad smell to me (passing in a instance when it shouldn't need the instance), so since we're at it, I thought why not just completely remove that bad smell once and for all. So I deleted the original instance methods, and yes that made the diff huge because all use sites of those fields/methods have to be touched to switch to using the new version. @kiszk WDYT? Were you intending to minimize the diff but still want to be able to access the stateless util methods as standalone functions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171782719 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package org.apache.spark.unsafe.memory; -import javax.annotation.Nullable; - import org.apache.spark.unsafe.Platform; +import javax.annotation.Nullable; + /** - * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size. + * A declaration of interfaces of MemoryBlock classes . */ -public class MemoryBlock extends MemoryLocation { +public abstract class MemoryBlock { + @Nullable + protected final Object obj; - private final long length; + protected final long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = -1; + private int pageNumber = -1; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { --- End diff -- Sounds reasonable. Should we still need `MemoryLocation` to expose a logical memory location? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171783898 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public static final void copyMemory( + MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + srcOffset, + dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length); + } + + public static final void copyMemory(MemoryBlock src, MemoryBlock dst, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(), + dst.getBaseObject(), dst.getBaseOffset(), length); + } + + public final void copyFrom(Object src, long
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171783854 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public static final void copyMemory( + MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + srcOffset, + dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length); + } + + public static final void copyMemory(MemoryBlock src, MemoryBlock dst, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(), + dst.getBaseObject(), dst.getBaseOffset(), length); --- End diff -- Check if `length <= src.size()` and
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171783186 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); --- End diff -- Should we define the behavior when `offset` is invalid here? E.g., out of range? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171784866 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -67,6 +84,14 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { return (word & mask) != 0; } + public static boolean isSetBlock(MemoryBlock base, long baseOffset, int index) { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171784381 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -56,6 +65,14 @@ public static void unset(Object baseObject, long baseOffset, int index) { Platform.putLong(baseObject, wordOffset, word & ~mask); } + public static void unsetBlock(MemoryBlock base, long baseOffset, int index) { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171785381 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- It is better to leave a comment why we have this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171787333 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { -assert (memory.obj == null) : +if (memory == OffHeapMemoryBlock.NULL) return; +assert(memory instanceof OffHeapMemoryBlock); --- End diff -- I think this assert should be in front of `if (memory == OffHeapMemoryBlock.NULL) return;`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171785802 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- Can't we just call `hashUnsafeBytes` with the block's baseObject, baseOffset and size? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171784878 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -80,6 +105,16 @@ public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidt return false; } + public static boolean anySetBlock(MemoryBlock base, long baseOffset, long bitSetWidthInWords) { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171784254 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -52,11 +53,16 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { * Optimized byte array equality check for byte arrays. * @return true if the arrays are equal, false otherwise */ + public static boolean arrayEqualsBlock( --- End diff -- It should have new doc instead of using `arrayEquals`'s. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171785720 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) { --- End diff -- Can't we just call `hashUnsafeBytes2` with the `baseObject`, `baseOffset` and `size` of `base`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171787117 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 0); + + public OffHeapMemoryBlock(long address, long size) { +super(null, address, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OffHeapMemoryBlock(this.offset + offset, size); + } + + public final int getInt(long offset) { +return Platform.getInt(null, offset); --- End diff -- Is this correct? Or `address + offset`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171785582 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) { +// This is compatible with original and another implementations. +// Use this method for new components after Spark 2.3. +long offset = base.getBaseOffset(); +int lengthInBytes = (int)base.size(); +assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative"; +int lengthAligned = lengthInBytes - lengthInBytes % 4; +int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), seed); +int k1 = 0; +for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift += 8) { + k1 ^= (base.getByte(offset + i) & 0xFF) << shift; +} +h1 ^= mixK1(k1); +return fmix(h1, lengthInBytes); + } + + private static int hashBytesByIntBlock(MemoryBlock base, int seed) { --- End diff -- Can't we just call `hashBytesByInt(base.getBaseObject(), base.getBaseOffset(), (int)base.size())`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171784371 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -45,6 +46,14 @@ public static void set(Object baseObject, long baseOffset, int index) { Platform.putLong(baseObject, wordOffset, word | mask); } + public static void setBlock(MemoryBlock base, long baseOffset, int index) { --- End diff -- It is better to add doc for it. It is not clear to know what purpose of this method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171783726 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public static final void copyMemory( + MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + srcOffset, + dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length); --- End diff -- Should we check if this copy is valid? E.g., srcOffset + length > src.size() or dstOffset + length > dst.size()? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional co
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171787508 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -50,12 +52,12 @@ // These are only updated by readExternal() or read() @Nonnull - private Object base; - private long offset; + private MemoryBlock base; + // While numBytes has the same value as base,length, to keep as int avoids cast from long to int --- End diff -- nit: `base,length` -> `base, length`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171787057 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 0); + + public OffHeapMemoryBlock(long address, long size) { +super(null, address, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OffHeapMemoryBlock(this.offset + offset, size); + } + + public final int getInt(long offset) { --- End diff -- `@Override`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r171790083 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala --- @@ -108,5 +111,29 @@ trait MLTest extends StreamTest with TempDirectory { self: Suite => otherResultCols: _*)(globalCheckFunction) testTransformerOnDF(dataframe, transformer, firstResultCol, otherResultCols: _*)(globalCheckFunction) +} + + def testTransformerByInterceptingException[A : Encoder]( +dataframe: DataFrame, +transformer: Transformer, +expectedMessagePart : String, +firstResultCol: String) { + +def hasExpectedMessage(exception: Throwable): Boolean = --- End diff -- I doubt whether the check here is too strict. It require exactly match message so when some class modify the exception message then many testcase will fail. Or can we just check the exception type ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r171790543 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with DefaultReadWriteTest { Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), ("male", "baz", 5)) .toDF("id", "a", "b") val model = formula.fit(original) +val attr = NominalAttribute.defaultAttr val expected = Seq( ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0), ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0), ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0), ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0) ).toDF("id", "a", "b", "features", "label") -// assert(result.schema.toString == resultSchema.toString) + .select($"id", $"a", $"b", $"features", $"label".as("label", attr.toMetadata())) --- End diff -- I think maybe ``` ... ).toDF("id", "a", "b", "features", "label") .select($"id", ... ``` looks beautiful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20705 @dongjoon-hyun, I agree with the idea in general but just to be clear do you target to improve the test coverage for both when `orc` and `parquet` are set to `spark.sql.sources.default` here? If not, how about we set `parquet` to `spark.sql.sources.default` explicitly where it's required for now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20121: [SPARK-22882][ML][TESTS] ML test for structured s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20121#discussion_r171796773 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -75,11 +71,9 @@ class MultilayerPerceptronClassifierSuite .setMaxIter(100) .setSolver("l-bfgs") val model = trainer.fit(dataset) -val result = model.transform(dataset) MLTestingUtils.checkCopyAndUids(trainer, model) -val predictionAndLabels = result.select("prediction", "label").collect() -predictionAndLabels.foreach { case Row(p: Double, l: Double) => - assert(p == l) +testTransformer[(Vector, Double)](dataset.toDF(), model, "prediction", "label") { --- End diff -- The type of `dataset` is `Dataset[_]` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20121: [SPARK-22882][ML][TESTS] ML test for structured s...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20121#discussion_r171796232 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -169,59 +171,28 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext val blas = BLAS.getInstance() val validationDataset = validationData.toDF(labelCol, featuresCol) -val results = gbtModel.transform(validationDataset) -// check that raw prediction is tree predictions dot tree weights -results.select(rawPredictionCol, featuresCol).collect().foreach { - case Row(raw: Vector, features: Vector) => +testTransformer[(Double, Vector)](validationDataset, gbtModel, + "rawPrediction", "features", "probability", "prediction") { + case Row(raw: Vector, features: Vector, prob: Vector, pred: Double) => assert(raw.size === 2) +// check that raw prediction is tree predictions dot tree weights val treePredictions = gbtModel.trees.map(_.rootNode.predictImpl(features).prediction) val prediction = blas.ddot(gbtModel.numTrees, treePredictions, 1, gbtModel.treeWeights, 1) assert(raw ~== Vectors.dense(-prediction, prediction) relTol eps) -} -// Compare rawPrediction with probability -results.select(rawPredictionCol, probabilityCol).collect().foreach { - case Row(raw: Vector, prob: Vector) => -assert(raw.size === 2) +// Compare rawPrediction with probability assert(prob.size === 2) // Note: we should check other loss types for classification if they are added val predFromRaw = raw.toDense.values.map(value => LogLoss.computeProbability(value)) assert(prob(0) ~== predFromRaw(0) relTol eps) assert(prob(1) ~== predFromRaw(1) relTol eps) assert(prob(0) + prob(1) ~== 1.0 absTol absEps) -} -// Compare prediction with probability -results.select(predictionCol, probabilityCol).collect().foreach { - case Row(pred: Double, prob: Vector) => +// Compare prediction with probability val predFromProb = prob.toArray.zipWithIndex.maxBy(_._1)._2 assert(pred == predFromProb) } -// force it to use raw2prediction -gbtModel.setRawPredictionCol(rawPredictionCol).setProbabilityCol("") -val resultsUsingRaw2Predict = - gbtModel.transform(validationDataset).select(predictionCol).as[Double].collect() - resultsUsingRaw2Predict.zip(results.select(predictionCol).as[Double].collect()).foreach { - case (pred1, pred2) => assert(pred1 === pred2) -} - -// force it to use probability2prediction -gbtModel.setRawPredictionCol("").setProbabilityCol(probabilityCol) -val resultsUsingProb2Predict = - gbtModel.transform(validationDataset).select(predictionCol).as[Double].collect() - resultsUsingProb2Predict.zip(results.select(predictionCol).as[Double].collect()).foreach { - case (pred1, pred2) => assert(pred1 === pred2) -} - -// force it to use predict --- End diff -- These testing code path has been covered by `ProbabilisticClassifierSuite.testPredictMethods`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171801055 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); --- End diff -- I see. I will put a comment since these operations do not guarantee behavior if `offset` is invalid. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171801720 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public static final void copyMemory( + MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + srcOffset, + dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length); --- End diff -- The condition depends on whether `MemoryBlock` is `OffHeapMemoryBlock`. Since the condition of this check is not simple, I gave up adding assertion due to performance. --- - To unsubscribe, e-mail: reviews
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171801856 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,136 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); +this.obj = obj; +this.offset = offset; this.length = length; } + public MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException(obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. + */ + public abstract MemoryBlock subBlock(long offset, long size); + + + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public abstract short getShort(long offset); + + public abstract void putShort(long offset, short value); + + public abstract long getLong(long offset); + + public abstract void putLong(long offset, long value); + + public abstract float getFloat(long offset); + + public abstract void putFloat(long offset, float value); + + public abstract double getDouble(long offset); + + public abstract void putDouble(long offset, double value); + + public static final void copyMemory( + MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + srcOffset, + dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length); + } + + public static final void copyMemory(MemoryBlock src, MemoryBlock dst, long length) { +Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(), + dst.getBaseObject(), dst.getBaseOffset(), length); --- End diff -- This assertion can be reasonably simp
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20682 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20682: [SPARK-23522][Python] always use sys.exit over bu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20682#discussion_r171803941 --- Diff: python/pyspark/ml/evaluation.py --- @@ -15,6 +15,8 @@ # limitations under the License. # +import sys + --- End diff -- Let's remove newline here. I think `abc` is is builtin one too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #87882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87882/testReport)** for PR 20682 at commit [`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171805189 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java --- @@ -45,6 +46,14 @@ public static void set(Object baseObject, long baseOffset, int index) { Platform.putLong(baseObject, wordOffset, word | mask); } + public static void setBlock(MemoryBlock base, long baseOffset, int index) { --- End diff -- Let me drop this since it is not used for now. It is provided for future use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20682 lgtm otherwise --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20343 We need to review #20433 first --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 @gatorsmile ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20692 thanks @gatorsmile and @rdblue for your discussion and thanks @gatorsmile for trusting me. I hope I will be worthy of your trust. I agree that since we are not close to a new release, having a throughout approach is better. Here is my analysis. Let me know if something else is needed. Basically, no other DB shows data type for columns. I won't include in this description the output for the various DBs, since for DB2 it is particularly verbose. Anyway, I confirm that DB2's output is what is shown in the article posted by @gatorsmile. Postgres's examples can be found here: https://www.postgresql.org/docs/10/static/using-explain.html. Oracle's ones here: https://docs.oracle.com/cd/B10501_01/server.920/a96540/statements_911a.htm#2061798. MySQL is not very different too, here it is a JSON representation of an example: ``` { "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "u", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows": 4, "filtered": 100 } }, { "table": { "table_name": "v", "access_type": "ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "username" ], "key_length": "258", "ref": [ "DBNAME.u.username" ], "rows": 1, "filtered": 100, "using_index": true } } ] } } ``` Here is SQLServer output (quite similar to the others too): ``` StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPUAvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions -- --- --- --- -- -- - - -- -- -- --- --- -- select * from t1 join t2 on t1.a=t2.a; 1 1 0 NULL NULL 1 NULL 1.0 NULL NULLNULL 2.4343444E-2 NULL NULL SELECT 0 NULL |--Hash Match(Inner Join, HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a]))1 2 1 Hash Match Inner Join HASH:([master].[dbo].[t2].[a])=([master].[dbo].[t1].[a]), RESIDUAL:([master].[dbo].[t2].[a]=[master].[dbo].[t1].[a]) NULL 1.00.0 1.7774245E-2 23 2.4343444E-2 [master].[dbo].[t1].[a], [master].[dbo].[t1].[b], [master].[dbo].[t2].[a], [master].[dbo].[t2].[b] NULL PLAN_ROW 01.0 |--Table Scan(OBJECT:([master].[dbo].[t2])) 1 3 2 Table Scan Table Scan OBJEC
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171815998 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 0); + + public OffHeapMemoryBlock(long address, long size) { +super(null, address, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OffHeapMemoryBlock(this.offset + offset, size); + } + + public final int getInt(long offset) { +return Platform.getInt(null, offset); --- End diff -- This is correct for `OffHeapMemoryBlock`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171820330 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- While not to call `hashUnsafeBytes` can achieve better performance, I will call `hashUnsafeBytes ` until we will remove `hashUnsafeBytes`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171821064 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) { +// This is compatible with original and another implementations. +// Use this method for new components after Spark 2.3. +long offset = base.getBaseOffset(); +int lengthInBytes = (int)base.size(); +assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative"; +int lengthAligned = lengthInBytes - lengthInBytes % 4; +int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), seed); +int k1 = 0; +for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift += 8) { + k1 ^= (base.getByte(offset + i) & 0xFF) << shift; +} +h1 ^= mixK1(k1); +return fmix(h1, lengthInBytes); + } + + private static int hashBytesByIntBlock(MemoryBlock base, int seed) { --- End diff -- We can do without considering performance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171820928 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- Do we need to leave a comment at every method with `...ByteBlock(`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171821706 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171821972 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -87,6 +106,35 @@ public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2Block(MemoryBlock base, int seed) { +// This is compatible with original and another implementations. +// Use this method for new components after Spark 2.3. +long offset = base.getBaseOffset(); +int lengthInBytes = (int)base.size(); +assert (lengthInBytes >= 0) : "lengthInBytes cannot be negative"; +int lengthAligned = lengthInBytes - lengthInBytes % 4; +int h1 = hashBytesByIntBlock(base.subBlock(offset, lengthAligned), seed); +int k1 = 0; +for (int i = lengthAligned, shift = 0; i < lengthInBytes; i++, shift += 8) { + k1 ^= (base.getByte(offset + i) & 0xFF) << shift; +} +h1 ^= mixK1(k1); +return fmix(h1, lengthInBytes); + } + + private static int hashBytesByIntBlock(MemoryBlock base, int seed) { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171822360 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- I am considering to change to call `hashUnsafeBytesBlock()` from `hashUnsafeBytes()`. It can also reduce the duplication. Does it make sense? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171822744 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -36,22 +42,35 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { -assert (memory.obj == null) : +if (memory == OffHeapMemoryBlock.NULL) return; +assert(memory instanceof OffHeapMemoryBlock); --- End diff -- good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20618 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1225/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20121 @smurakozi Address your comments. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20618 **[Test build #87883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87883/testReport)** for PR 20618 at commit [`10afda5`](https://github.com/apache/spark/commit/10afda5d97146af7e02cc3b18968f89f074e093c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20121 **[Test build #87884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87884/testReport)** for PR 20121 at commit [`6d59c5b`](https://github.com/apache/spark/commit/6d59c5b017594e6b6d0b7c937207414fc942e448). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r171829796 --- Diff: python/pyspark/sql/functions.py --- @@ -173,16 +172,26 @@ def _(): _functions_2_1 = { # unary math functions -'degrees': 'Converts an angle measured in radians to an approximately equivalent angle ' + - 'measured in degrees.', -'radians': 'Converts an angle measured in degrees to an approximately equivalent angle ' + - 'measured in radians.', +'degrees': """Converts an angle measured in radians to an approximately equivalent angle + measured in degrees. --- End diff -- Let's add a newline here. Seems doc could be broken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r171829946 --- Diff: python/pyspark/sql/functions.py --- @@ -173,16 +172,26 @@ def _(): _functions_2_1 = { # unary math functions -'degrees': 'Converts an angle measured in radians to an approximately equivalent angle ' + - 'measured in degrees.', -'radians': 'Converts an angle measured in degrees to an approximately equivalent angle ' + - 'measured in radians.', +'degrees': """Converts an angle measured in radians to an approximately equivalent angle + measured in degrees. + :param col: angle in radians + :return: angle in degrees, as if computed by `java.lang.Math.toDegrees()`""", +'radians': """Converts an angle measured in degrees to an approximately equivalent angle measured in radians. --- End diff -- Seems this hits the line length limit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171834517 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -50,12 +52,12 @@ // These are only updated by readExternal() or read() @Nonnull - private Object base; - private long offset; + private MemoryBlock base; + // While numBytes has the same value as base,length, to keep as int avoids cast from long to int --- End diff -- thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171834629 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -52,11 +53,16 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { * Optimized byte array equality check for byte arrays. * @return true if the arrays are equal, false otherwise */ + public static boolean arrayEqualsBlock( --- End diff -- yeah --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r171834617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1500,31 +1534,35 @@ object functions { } /** - * Computes the cosine of the given value. Units in radians. + * @param e angle in radians + * @return cosine of the angle, as if computed by [[java.lang.Math#cos]] --- End diff -- Seems this link is broken in Scaladoc. Let's just use `` `java.lang.Math.cos` `` in this file. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171834562 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 0); + + public OffHeapMemoryBlock(long address, long size) { +super(null, address, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OffHeapMemoryBlock(this.offset + offset, size); + } + + public final int getInt(long offset) { --- End diff -- good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r171836519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,165 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @return inverse cosine of `e` in radians, as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ def acos(e: Column): Column = withExpr { Acos(e.expr) } /** - * Computes the cosine inverse of the given column; the returned angle is in the range - * 0.0 through pi. + * @return inverse cosine of `columnName`, as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ def acos(columnName: String): Column = acos(Column(columnName)) /** - * Computes the sine inverse of the given value; the returned angle is in the range - * -pi/2 through pi/2. + * @return inverse sine of `e` in radians, as if computed by [[java.lang.Math#asin]] * * @group math_funcs * @since 1.4.0 */ def asin(e: Column): Column = withExpr { Asin(e.expr) } /** - * Computes the sine inverse of the given column; the returned angle is in the range - * -pi/2 through pi/2. + * @return inverse sine of `columnName`, as if computed by [[java.lang.Math#asin]] * * @group math_funcs * @since 1.4.0 */ def asin(columnName: String): Column = asin(Column(columnName)) /** - * Computes the tangent inverse of the given column; the returned angle is in the range - * -pi/2 through pi/2 + * @return inverse tangent of `e`, as if computed by [[java.lang.Math#atan]] * * @group math_funcs * @since 1.4.0 */ def atan(e: Column): Column = withExpr { Atan(e.expr) } /** - * Computes the tangent inverse of the given column; the returned angle is in the range - * -pi/2 through pi/2 + * @return inverse tangent of `columnName`, as if computed by [[java.lang.Math#atan]] * * @group math_funcs * @since 1.4.0 */ def atan(columnName: String): Column = atan(Column(columnName)) /** - * Returns the angle theta from the conversion of rectangular coordinates (x, y) to - * polar coordinates (r, theta). Units in radians. + * @param y coordinate on y-axis + * @param x coordinate on x-axis + * @return the theta component of the point + * (r, theta) + * in polar coordinates that corresponds to the point + * (x, y) in Cartesian coordinates, + * as if computed by [[java.lang.Math#atan2]] * * @group math_funcs * @since 1.4.0 */ - def atan2(l: Column, r: Column): Column = withExpr { Atan2(l.expr, r.expr) } + def atan2(y: Column, x: Column): Column = withExpr { Atan2(y.expr, x.expr) } /** - * Returns the angle theta from the conversion of rectangular coordinates (x, y) to - * polar coordinates (r, theta). + * @param y coordinate on y-axis + * @param xName coordinate on x-axis + * @return the theta component of the point + * (r, theta) + * in polar coordinates that corresponds to the point + * (x, y) in Cartesian coordinates, + * as if computed by [[java.lang.Math#atan2]] * * @group math_funcs * @since 1.4.0 */ - def atan2(l: Column, rightName: String): Column = atan2(l, Column(rightName)) + def atan2(y: Column, xName: String): Column = atan2(y, Column(xName)) /** - * Returns the angle theta from the conversion of rectangular coordinates (x, y) to - * polar coordinates (r, theta). + * @param yName coordinate on y-axis + * @param x coordinate on x-axis + * @return the theta component of the point + * (r, theta) + * in polar coordinates that corresponds to the point + * (x, y) in Cartesian coordinates, + * as if computed by [[java.lang.Math#atan2]] * * @group math_funcs * @since 1.4.0 */ - def atan2(leftName: String, r: Column): Column = atan2(Column(leftName), r) + def atan2(yName: String, x: Column): Column = atan2(Column(yName), x) /** - * Returns the angle theta from the conversion of rectangular coordinates (x, y) to - * polar coordinates (r, theta). + * @param yName coordinate on y-axis +
[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...
Github user sandecho commented on the issue: https://github.com/apache/spark/pull/20709 Can you please test it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171839162 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe.memory; + +import org.apache.spark.unsafe.Platform; + +public class OffHeapMemoryBlock extends MemoryBlock { + static public final OffHeapMemoryBlock NULL = new OffHeapMemoryBlock(0, 0); + + public OffHeapMemoryBlock(long address, long size) { +super(null, address, size); + } + + @Override + public MemoryBlock subBlock(long offset, long size) { +return new OffHeapMemoryBlock(this.offset + offset, size); + } + + public final int getInt(long offset) { +return Platform.getInt(null, offset); --- End diff -- You mean here `offset` is absolute address? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171839479 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- Seems they are somehow duplicate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20121 **[Test build #87884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87884/testReport)** for PR 20121 at commit [`6d59c5b`](https://github.com/apache/spark/commit/6d59c5b017594e6b6d0b7c937207414fc942e448). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20121: [SPARK-22882][ML][TESTS] ML test for structured streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87884/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1226/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #87885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87885/testReport)** for PR 19222 at commit [`eb0cc6d`](https://github.com/apache/spark/commit/eb0cc6d2ddb3fc648e1d449dad6e53abbabcf2fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r171842603 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -52,13 +53,31 @@ public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { return hashUnsafeWords(base, offset, lengthInBytes, seed); } + public static int hashUnsafeWordsBlock(MemoryBlock base, long offset, int lengthInBytes, int seed) { +return hashUnsafeWords(base.getBaseObject(), offset, lengthInBytes, seed); + } + public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +long offset = base.getBaseOffset(); --- End diff -- Sure, to eliminate duplication, we will call `hashUnsafeBytesBlock()` from `hashUnsafeBytes()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20712 @mgaido91 Yeah, you are right. This is pretty pointless unless you either set this as a static spark conf or pass it to the executor using local properties. Both aren't very attractive IMO. @passionke why do we need this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20712 +1, I would like to know the case where we increased cache hit ratio. Do we reuse the same plan beyond more than 100 plans? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #87882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87882/testReport)** for PR 20682 at commit [`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87882/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20682 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20700 @rednaxelafx Oh, very interesting since we are doing the similar thing in West coast and Japan! I just say `not refactored YET`. Yeah, I absolutely love to delete old instance method if we totally agree. Is it OK to delete all of old method in this PR? `Some methods that very frequently used (e.g. javaType()) are not refactored yet.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20716: [SPARK-23566][Minor][Doc] Argument name mismatch ...
GitHub user animenon opened a pull request: https://github.com/apache/spark/pull/20716 [SPARK-23566][Minor][Doc] Argument name mismatch fixed Argument name mismatch fixed. ## What changes were proposed in this pull request? `col` changed to `new` in doc string to match the argument list. Patch file added: https://issues.apache.org/jira/browse/SPARK-23566 Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/animenon/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20716.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20716 commit 5d2bb9d0b480d7ac2a2c2d6364b87789f57856e8 Author: Anirudh Date: 2018-03-02T13:58:27Z [SPARK-23566][Minor][Doc] Argument name mismatch fixed Argument name mismatch fixed. `col` changed to `new` in doc string to match the argument list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20716 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1227/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20700 **[Test build #87886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87886/testReport)** for PR 20700 at commit [`f589a2a`](https://github.com/apache/spark/commit/f589a2add87bc40db0cce845873571717c631c59). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20639 Executed these tests manually again but working fine. Seems like flaky. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1228/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20700 **[Test build #87887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87887/testReport)** for PR 20700 at commit [`ebce1f2`](https://github.com/apache/spark/commit/ebce1f20df20d17bb4ad54faba8d597f7a02b635). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20716 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20716 It's fine. Can you double check if there are same instances in this file or other files? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org