date:20171029

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83179/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19601
  
**[Test build #83179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83179/testReport)**
 for PR 19601 at commit 
[`80b9e31`](https://github.com/apache/spark/commit/80b9e319211765807766e5cf70e995bdbbebf22e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  public static final class UnsafeArray extends ArrayData `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147571324
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,28 +19,48 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-long address = Platform.allocateMemory(size);
-MemoryBlock memory = new MemoryBlock(null, address, size);
-if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
-  memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+Object buffer = ByteBuffer.allocateDirect((int)size);
--- End diff --

`Platform.allocateMemory` return a memory address of the region in 
off-heap. This region will be freed only when we call `Platform.freeMemory`.  
`ByteBuffer.allocateDirect` return a Java object that wraps a memory region 
in off-heap. This region will be automatically freed when the Java object is 
collected by GC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

2017-10-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19602#discussion_r147572381
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
   for {
 ds <- 20170101 to 20170103
 h <- 0 to 23
-chunk <- Seq("aa", "ab", "ba", "bb")
+chunk <- Seq("11", "12", "21", "22")
--- End diff --

What is the reason we need to change so many lines in the test case? Could 
you a new test case, instead of changing the existing one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
@Ueshin @cloud-fan could you please review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573354
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,28 +19,48 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-long address = Platform.allocateMemory(size);
-MemoryBlock memory = new MemoryBlock(null, address, size);
-if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
-  memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+Object buffer = ByteBuffer.allocateDirect((int)size);
--- End diff --

To address issue for releasing a memory, 
[here](https://hbase.apache.org/0.94/apidocs/src-html/org/apache/hadoop/hbase/util/DirectMemoryUtils.html#line.80)
 is one solution


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19567
  
**[Test build #83183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83183/testReport)**
 for PR 19567 at commit 
[`f3d45d0`](https://github.com/apache/spark/commit/f3d45d01b2e837d04143ac2a038aa616e5ecbfa3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19563
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19567
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19563
  
**[Test build #83185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)**
 for PR 19563 at commit 
[`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19563
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83185/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19567
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83183/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15049
  
**[Test build #83186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83186/testReport)**
 for PR 15049 at commit 
[`cdbdbf7`](https://github.com/apache/spark/commit/cdbdbf7fc97710412720e41889c70123b59bcf55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83193/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83193 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83193/testReport)**
 for PR 17899 at commit 
[`f840c69`](https://github.com/apache/spark/commit/f840c69231025b53d057ec7e1032750abfa856b8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19601
  
I'd like to also improve the write path. I think the current way to cache 
array type is not efficient, arrow-like format which put all elements(including 
nested array) together is better for encoding and compression.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19601
  
both ways work, just pick the simpler one. I'm concerned about how to 
access the nested array, you can try both approaches and see which one can 
solve the problem easier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19592: [SPARK-22347][SQL][PySpark] Support optionally running P...

2017-10-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19592
  
ping @ueshin @BryanCutler @cloud-fan Would you mind to provide some 
insights? Should we add just a document for it or fix it in your opinions? 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
Current `ColumnVector` uses primitive type array (e.g. `int[]` or 
`double[]`) based on data type of each column. On the other hand, cached data 
uses `byte[]` for all data type.  
Do we change format (`Array[Array[Byte]]`) in 
[`CachedBatch`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala#L53)
 for an primitive array?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83189/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83189/testReport)**
 for PR 17899 at commit 
[`54d88fa`](https://github.com/apache/spark/commit/54d88fabf2076dbe17f22a72526ad7879a4c8eb0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
For now, this implementation has an limitation only to support non-nested 
array for ease of review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19567
  
**[Test build #83195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83195/testReport)**
 for PR 19567 at commit 
[`fae5c45`](https://github.com/apache/spark/commit/fae5c455b4a754128bc9112bbead4aef3cc322a2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

2017-10-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19603
  
Good catch! LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19602
  
@gatorsmile 
Thanks again for review this pr.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573017
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,28 +19,48 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-long address = Platform.allocateMemory(size);
-MemoryBlock memory = new MemoryBlock(null, address, size);
-if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
-  memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+Object buffer = ByteBuffer.allocateDirect((int)size);
--- End diff --

I knew that to use `Platform.allocateMemory` is [current 
design](http://downloads.typesafe.com/website/presentations/ScalaDaysSF2015/T4_Xin_Performance_Optimization.pdf#page=26).
 I will revert this change.

If we want to use `ByteBuffer.allocateDirect`, it would be good to submit a 
separate PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19567
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573145
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java ---
@@ -33,15 +32,13 @@
   private static final long WIDTH = 8;
 
   private final MemoryBlock memory;
-  private final Object baseObj;
   private final long baseOffset;
 
   private final long length;
 
   public LongArray(MemoryBlock memory) {
 assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size >= 
Integer.MAX_VALUE elements";
 this.memory = memory;
-this.baseObj = memory.getBaseObject();
 this.baseOffset = memory.getBaseOffset();
--- End diff --

I see. I remove this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573154
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
 ---
@@ -230,7 +233,10 @@ public UTF8String getUTF8String(int ordinal) {
 final long offsetAndSize = getLong(ordinal);
 final int offset = (int) (offsetAndSize >> 32);
 final int size = (int) offsetAndSize;
-return UTF8String.fromAddress(baseObject, baseOffset + offset, size);
+MemoryBlock mb = (baseObject instanceof byte[]) ?
+  new ByteArrayMemoryBlock((byte[]) baseObject, baseOffset + offset, 
size) :
+  new LongArrayMemoryBlock((long[]) baseObject, baseOffset + offset, 
size);
--- End diff --

I refactor this and move this logic into `MemoryBlock`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19563
  
**[Test build #83185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83185/testReport)**
 for PR 19563 at commit 
[`70c2304`](https://github.com/apache/spark/commit/70c2304393d6ec967a509fde8746e8268e7c80ee).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...

2017-10-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19553
  
@Tagar it's the same general type of issue, but not directly related nor 
exactly the same cause. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147576840
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,28 +19,48 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-long address = Platform.allocateMemory(size);
-MemoryBlock memory = new MemoryBlock(null, address, size);
-if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
-  memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+Object buffer = ByteBuffer.allocateDirect((int)size);
--- End diff --

I just wanna understand why we would like to pick 
`ByteBuffer.allocateDirect` with a hacky free method. Does 
`ByteBuffer.allocateDirect` have better performance? or you just wanna change 
the life cycle of `MemoryBlock` and ask GC to control it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19529
  
@nkronenfeld @gatorsmile I think this has been failing the master build 
(Maven only) for a few days:


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/3981/consoleFull

```
SQLQuerySuite:
*** RUN ABORTED ***
  org.apache.spark.SparkException: Only one SparkContext may be running in 
this JVM (see SPARK-2243). To ignore this error, set 
spark.driver.allowMultipleContexts = true. The currently running SparkContext 
was created at:

org.apache.spark.sql.test.GenericFunSpecSuite.(GenericFunSpecSuite.scala:28)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:422)
java.lang.Class.newInstance(Class.java:442)

org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:66)
org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)

scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
scala.collection.AbstractTraversable.map(Traversable.scala:104)
org.scalatest.tools.DiscoverySuite.(DiscoverySuite.scala:37)
org.scalatest.tools.Runner$.genDiscoSuites$1(Runner.scala:1165)
org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1250)
```

I suspect that there's some slightly different way that the tests execute 
in Maven that may highlight a problem with how the SQL tests use and reuse 
SparkContexts. It's likely this change did cause it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15049
  
**[Test build #83184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83184/testReport)**
 for PR 15049 at commit 
[`86c863a`](https://github.com/apache/spark/commit/86c863aa3f0de6aed9d77a50660003a64c36f98f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15049
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83184/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15049
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19567: [SPARK-22291][SQL] Conversion error when transfor...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19567#discussion_r147578399
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -456,8 +456,10 @@ object JdbcUtils extends Logging {
 
 case StringType =>
   (array: Object) =>
-array.asInstanceOf[Array[java.lang.String]]
-  .map(UTF8String.fromString)
+// some underling types are not String such as uuid, inet, 
cidr, etc.
+array.asInstanceOf[Array[java.lang.Object]]
+  .map(obj => UTF8String.fromString(
+if (obj == null) null else obj.toString))
--- End diff --

nit: `if (obj == null) null else UTF8String.fromString(obj.toString)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19567: [SPARK-22291][SQL] Conversion error when transfor...

2017-10-29 Thread jmchung

Github user jmchung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19567#discussion_r147579149
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -456,8 +456,10 @@ object JdbcUtils extends Logging {
 
 case StringType =>
   (array: Object) =>
-array.asInstanceOf[Array[java.lang.String]]
-  .map(UTF8String.fromString)
+// some underling types are not String such as uuid, inet, 
cidr, etc.
+array.asInstanceOf[Array[java.lang.Object]]
+  .map(obj => UTF8String.fromString(
+if (obj == null) null else obj.toString))
--- End diff --

Thanks @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83191/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83191/testReport)**
 for PR 17899 at commit 
[`82d7390`](https://github.com/apache/spark/commit/82d7390e284b9324c75f4a0755f91fb3127f4d7f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147579950
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
--- End diff --

you can use `seqObj.isInstanceOf[Seq[_]]`, but not a `Class`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18251
  
**[Test build #83187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83187/testReport)**
 for PR 18251 at commit 
[`f58173c`](https://github.com/apache/spark/commit/f58173c2497d5996633e5faff9d4a6bfbb70cbe1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18251
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83180/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18251
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18251
  
**[Test build #83180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83180/testReport)**
 for PR 18251 at commit 
[`d4e3021`](https://github.com/apache/spark/commit/d4e30212552900a45913c7f1866dbfbe49d57317).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class TooLargePageException extends RuntimeException `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #83181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83181/testReport)**
 for PR 19602 at commit 
[`26c7a7b`](https://github.com/apache/spark/commit/26c7a7be273265d8b25ed3fdf4d6fd3b55f225d8).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83181/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573035
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OffHeapMemoryBlock.java
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+public class OffHeapMemoryBlock extends MemoryBlock {
+  private Object directBuffer;
--- End diff --

`directBuffer` is used to control lifetime of `DirectByteBuffer`. Since I 
decided not to use `DirectByteBuffer` in this PR, I will remove this field 
`directBuffer`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15049
  
**[Test build #83184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83184/testReport)**
 for PR 15049 at commit 
[`86c863a`](https://github.com/apache/spark/commit/86c863aa3f0de6aed9d77a50660003a64c36f98f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19567
  
**[Test build #83183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83183/testReport)**
 for PR 19567 at commit 
[`f3d45d0`](https://github.com/apache/spark/commit/f3d45d01b2e837d04143ac2a038aa616e5ecbfa3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...

2017-10-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15049#discussion_r147573054
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -555,6 +572,32 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   }
 }
   }
+
+  test("Filters should be pushed down for Parquet readers at row group 
level") {
--- End diff --

@jiangxb1987, here I added a test to make sure  disabling 
'spark.sql.parquet.recordFilter' still enables row group level filtering.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573040
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
 ---
@@ -73,6 +73,12 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 
   @Override
   public void free(MemoryBlock memory) {
+assert(memory instanceof ByteArrayMemoryBlock || memory instanceof 
IntArrayMemoryBlock ||
--- End diff --

Good catch, you are right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19563#discussion_r147573635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -389,9 +389,10 @@ abstract class HashExpression[E] extends Expression {
   input: String,
   result: String,
   fields: Array[StructField]): String = {
-fields.zipWithIndex.map { case (field, index) =>
+val hashes = fields.zipWithIndex.map { case (field, index) =>
   nullSafeElementHash(input, index.toString, field.nullable, 
field.dataType, result, ctx)
-}.mkString("\n")
+}
+ctx.splitExpressions(hashes, "apply", ("InternalRow", input) :: Nil)
--- End diff --

Good catch, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19563#discussion_r147573624
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -639,6 +639,63 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(hiveHashPlan(wideRow).getInt(0) == hiveHashEval)
   }
 
+  test("SPARK-22284: Compute hash for nested structs") {
+val M = 80
+val N = 10
+val L = M * N
+val O = 50
+val seed = 42
+
+val wideRow1 = new GenericInternalRow(Seq.tabulate(O)(j =>
+new GenericInternalRow(Seq.tabulate(L)(i =>
+  new GenericInternalRow(Array[Any](
+UTF8String.fromString((j * L + i).toString
+  .toArray[Any])).toArray[Any])
+var inner1 = new StructType()
--- End diff --

I see, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83182/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #83182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83182/testReport)**
 for PR 19602 at commit 
[`a3a978e`](https://github.com/apache/spark/commit/a3a978eb612cdc5944c05951e5a6eb9457ae6962).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19601
  
My feeling is that, we should change the cache format of array type to make 
it compatible with `ColumnVector`, then we don't need conversion from cached 
data to columnar batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19601#discussion_r147578474
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -367,9 +551,13 @@ public Object get(int ordinal, DataType dataType) {
   /**
* Returns the array at rowid.
*/
-  public final ColumnVector.Array getArray(int rowId) {
-resultArray.length = getArrayLength(rowId);
-resultArray.offset = getArrayOffset(rowId);
+  public final ArrayData getArray(int rowId) {
--- End diff --

We should not change the return type. `ColumnVector` will be public 
eventually, and `ArrayData` is not a public type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83189/testReport)**
 for PR 17899 at commit 
[`54d88fa`](https://github.com/apache/spark/commit/54d88fabf2076dbe17f22a72526ad7879a4c8eb0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19567
  
**[Test build #83190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83190/testReport)**
 for PR 19567 at commit 
[`588902d`](https://github.com/apache/spark/commit/588902d21fb12bf80169edc74097d7bda950668c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19601
  
I agree with you that we need to improve the write path. It will be 
addressed after improving the frequently-executed read path, as you suggested 
before. It will be addressed by the following PR.

For improving the read path, which approach is better? To add new 
`ColumnVector.Array` or to add new `WritableColumnVector`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19567: [SPARK-22291][SQL] Conversion error when transfor...

2017-10-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19567#discussion_r147581064
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 ---
@@ -134,11 +149,28 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(schema(1).dataType == ShortType)
   }
 
-  test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE should be recognized") {
+  test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE " +
+"should be recognized") {
 val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
 val rows = dfRead.collect()
 val types = rows(0).toSeq.map(x => x.getClass.toString)
 assert(types(1).equals("class java.sql.Timestamp"))
 assert(types(2).equals("class java.sql.Timestamp"))
   }
+
+  test("SPARK-22291: PostgreSQL UUID[] to StringType: Conversion Error") {
--- End diff --

Let's change this test title too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147581519
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
-s"${genInputData.value}.size()" -> 
s"${genInputData.value}.apply($loopIndex)"
+val it = ctx.freshName("it")
+(
+  s"${genInputData.value}.size()",
+  s"scala.collection.Iterator $it = 
${genInputData.value}.toIterator();",
--- End diff --

I think it is better to leave a comment for code readers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147581528
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
--- End diff --

Oops, right, not reading this clearly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147581548
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
-s"${genInputData.value}.size()" -> 
s"${genInputData.value}.apply($loopIndex)"
+val it = ctx.freshName("it")
--- End diff --

I agreed. Currently the branches are a bit complicated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19603
  
**[Test build #83188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)**
 for PR 19603 at commit 
[`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19603
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83188/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19603
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...

2017-10-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19597
  
@xynny, mind if I ask your JIRA id?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread jmchung

Github user jmchung commented on the issue:

https://github.com/apache/spark/pull/19567
  
gentle ping @cloud-fan and @viirya, there are some feedbacks about the 
behavior of obj to string.

``` scala
case StringType =>
  (array: Object) =>
array.asInstanceOf[Array[java.lang.Object]]
  .map(obj => UTF8String.fromString(obj.toString))
```
As @HyukjinKwon mentioned before, it may unsafe to covnert the object to 
string directly, we'll have the `java.lang.NullPointerException` in test cases. 
`StringType` is different from other types use `nullSafeConvert` method to deal 
with the null, it process within `UTF8String.fromString`.

The `UTF8String.fromString` returns null and UTF8String so that we cannot 
use `String.valueOf(obj)` to avoid the NPE, it will raise another errors 
(because `null != "null"`):

```
- Type mapping for various types *** FAILED ***
  Array("a", "null", "b") did not equal List("a", null, "b") 
(PostgresIntegrationSuite.scala:120)
 ``` 

IMHO, if we don't want to make separation of match cases, we may remain 
null to meet the original design, e.g.,

``` scala
case StringType =>
  (array: Object) =>
array.asInstanceOf[Array[java.lang.Object]]
  .map(obj => UTF8String.fromString(
if (obj == null) null else obj.toString))
```

Alternative is we keep the String and Object cases?

Would you mind give some comments? I can keep working on this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147573486
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -17,47 +17,168 @@
 
 package org.apache.spark.unsafe.memory;
 
-import javax.annotation.Nullable;
-
 import org.apache.spark.unsafe.Platform;
 
+import javax.annotation.Nullable;
+
 /**
- * A consecutive block of memory, starting at a {@link MemoryLocation} 
with a fixed size.
+ * A declaration of interfaces of MemoryBlock classes .
  */
-public class MemoryBlock extends MemoryLocation {
+public abstract class MemoryBlock {
+  @Nullable
+  protected final Object obj;
 
-  private final long length;
+  protected final long offset;
+
+  protected final long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = -1;
+  private int pageNumber = -1;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public abstract Object getObjectVolatile(long offset);
+
+  public abstract void putObjectVolatile(long offset, Object value);
+
+  public static final void copyMemory(
--- End diff --

`copyMemory` has two kinds of arguments: `src` and `dst`.  
If we make them non-static method, it is not easy to understand whether 
`this` is `src` or `dst`. If we keep method name `MemoryBlock.copy`, I think 
that it is good to keep current form (it is common in `memcpy`).

Or, can we prepare two methods `MemoryBlock.copyFrom(srcOffset, dst...)` 
and `MemoryBlock.copyTo(src..., dstOffset)`.

@cloud-fan What do you think?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19600: Added more information to Imputer

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19600
  
**[Test build #3963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3963/testReport)**
 for PR 19600 at commit 
[`2350fc1`](https://github.com/apache/spark/commit/2350fc1de3eaa0a3aafd7aba11c5d99b39d4c606).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19553#discussion_r147576141
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala ---
@@ -43,10 +43,15 @@ private[spark] object JavaUtils {
 
 override def size: Int = underlying.size
 
-override def get(key: AnyRef): B = try {
-  underlying.getOrElse(key.asInstanceOf[A], null.asInstanceOf[B])
-} catch {
-  case ex: ClassCastException => null.asInstanceOf[B]
+// Delegate to implementation because AbstractMap implementation 
iterates over whole key set
+override def containsKey(key: AnyRef): Boolean = key match {
+  case key: A => underlying.contains(key)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18251
  
**[Test build #83187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83187/testReport)**
 for PR 18251 at commit 
[`f58173c`](https://github.com/apache/spark/commit/f58173c2497d5996633e5faff9d4a6bfbb70cbe1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19603: [SPARK-22385][SQL] MapObjects should not access list ele...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19603
  
**[Test build #83188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83188/testReport)**
 for PR 19603 at commit 
[`6cb4fca`](https://github.com/apache/spark/commit/6cb4fca89e83172407114037d3a447ae6d941f0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15049
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19601#discussion_r147578600
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -367,9 +551,13 @@ public Object get(int ordinal, DataType dataType) {
   /**
* Returns the array at rowid.
*/
-  public final ColumnVector.Array getArray(int rowId) {
-resultArray.length = getArrayLength(rowId);
-resultArray.offset = getArrayOffset(rowId);
+  public final ArrayData getArray(int rowId) {
--- End diff --

I see.
One question. `ColumnVector.Array` has some public fields such as `length`. 
I think that it would be good to use an accessor `length` or `getLength`. What 
do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15049
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83186/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19601
  
So for primitive types, we encode and compress them to binary. When reading 
cached data, they are decoded to primitive array and can be put in 
`OnHeadColumnVector` directly.

For primitive type array, we treat it as binary. So when decoding it, we 
get a byte[] and need more effort to convert it to primitive type and put in 
`OnHeadColumnVector`.

Can we change how we encode array type like Arrow did?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147580091
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
-s"${genInputData.value}.size()" -> 
s"${genInputData.value}.apply($loopIndex)"
+val it = ctx.freshName("it")
+(
+  s"${genInputData.value}.size()",
+  s"scala.collection.Iterator $it = 
${genInputData.value}.toIterator();",
+  s"$it.next()"
+)
   case ObjectType(cls) if cls.isArray =>
-s"${genInputData.value}.length" -> 
s"${genInputData.value}[$loopIndex]"
+(
+  s"${genInputData.value}.length",
+  "",
+  s"${genInputData.value}[$loopIndex]"
+)
   case ObjectType(cls) if 
classOf[java.util.List[_]].isAssignableFrom(cls) =>
--- End diff --

The legal external typea for catalyst array are scala `Seq` and java 
`List`, so we don't need to match a more general type here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83193/testReport)**
 for PR 17899 at commit 
[`f840c69`](https://github.com/apache/spark/commit/f840c69231025b53d057ec7e1032750abfa856b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19567
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19567: [SPARK-22291][SQL] Conversion error when transfor...

2017-10-29 Thread jmchung

Github user jmchung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19567#discussion_r147581349
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 ---
@@ -134,11 +149,28 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(schema(1).dataType == ShortType)
   }
 
-  test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE should be recognized") {
+  test("SPARK-20557: column type TIMESTAMP with TIME ZONE and TIME with 
TIME ZONE " +
+"should be recognized") {
 val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new 
Properties)
 val rows = dfRead.collect()
 val types = rows(0).toSeq.map(x => x.getClass.toString)
 assert(types(1).equals("class java.sql.Timestamp"))
 assert(types(2).equals("class java.sql.Timestamp"))
   }
+
+  test("SPARK-22291: PostgreSQL UUID[] to StringType: Conversion Error") {
--- End diff --

Oops, I forgot it, thanks!!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83187/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18251: [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in...

2017-10-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18251
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19602
  
@gatorsmile 
Thanks a lot for your help :)

>Can we just evaluate the right side CAST(2017 as STRING), since it is 
foldable?

Do you mean to add a new rule ? -- cast the type of literal when partition 
key and the literal are different types.

In current code, `StringType` are promoted(e.g. casted to `IntegerType`) in 
`PromoteStrings` when it is BinaryComparison 
(https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L366)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #83182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83182/testReport)**
 for PR 19602 at commit 
[`a3a978e`](https://github.com/apache/spark/commit/a3a978eb612cdc5944c05951e5a6eb9457ae6962).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19597: [SPARK-22375][TEST] Test script can fail if eggs are ins...

2017-10-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19597
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #83181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83181/testReport)**
 for PR 19602 at commit 
[`26c7a7b`](https://github.com/apache/spark/commit/26c7a7be273265d8b25ed3fdf4d6fd3b55f225d8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r147578071
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,28 +19,48 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
-long address = Platform.allocateMemory(size);
-MemoryBlock memory = new MemoryBlock(null, address, size);
-if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
-  memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+Object buffer = ByteBuffer.allocateDirect((int)size);
--- End diff --

This is easy to manage life cycle of native memory without calling `free`. 
This code was derived from #11494.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19601
  
Can we use `OffHeapColumnVector` for cached data?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19603: [SPARK-22385][SQL] MapObjects should not access l...

2017-10-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19603#discussion_r147580056
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -591,18 +591,40 @@ case class MapObjects private(
   case _ => inputData.dataType
 }
 
-val (getLength, getLoopVar) = inputDataType match {
+val (getLength, prepareLoop, getLoopVar) = inputDataType match {
   case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) =>
-s"${genInputData.value}.size()" -> 
s"${genInputData.value}.apply($loopIndex)"
+val it = ctx.freshName("it")
--- End diff --

That's not a big deal, we are traversing the collection from start to end, 
so either Iterator or accessing by index won't have much difference, may not 
worth to create 2 branches here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to transpose adj...

2017-10-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17899
  
**[Test build #83194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83194/testReport)**
 for PR 17899 at commit 
[`e9f6928`](https://github.com/apache/spark/commit/e9f6928bb60e7c4de25324e5572f105a30d16cd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19567: [SPARK-22291][SQL] Conversion error when transforming ar...

2017-10-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19567
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 218 matches

Mail list logo