[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1308/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20633
  
**[Test build #87996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87996/testReport)**
 for PR 20633 at commit 
[`166cdbb`](https://github.com/apache/spark/commit/166cdbb3e95315e0feb29fb26c6c98837747e22d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87992/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20745
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20745
  
**[Test build #87992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87992/testReport)**
 for PR 20745 at commit 
[`55aa8bc`](https://github.com/apache/spark/commit/55aa8bca96b112a33cabb352afb4168c2d8f355c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CatalogColumnStat(`
  * `case class LocalRelation(`
  * `case class StreamingDataSourceV2Relation(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
You meant the HIVE jira? If so, no (I was going to check now). Any point I 
should know? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-03-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
Yes, I'm doing it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
ok, I'll update based on the comments soon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20433
  
Could you create `interval.sql` by adding the test cases in 
https://issues.apache.org/jira/browse/HIVE-13557 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-03-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r172427740
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -790,6 +796,16 @@ ASC: 'ASC';
 DESC: 'DESC';
 FOR: 'FOR';
 INTERVAL: 'INTERVAL';
+YEAR: 'YEAR' | 'YEARS';
--- End diff --

Also update `TableIdentifierParserSuite`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-03-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r172427617
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -790,6 +796,16 @@ ASC: 'ASC';
 DESC: 'DESC';
 FOR: 'FOR';
 INTERVAL: 'INTERVAL';
+YEAR: 'YEAR' | 'YEARS';
+MONTH: 'MONTH' | 'MONTHS';
+WEEK: 'WEEK' | 'WEEKS';
+DAY: 'DAY' | 'DAYS';
+HOUR: 'HOUR' | 'HOURS';
+MINUTE: 'MINUTE' | 'MINUTES';
+SECOND: 'SECOND' | 'SECONDS';
+MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
+MICROSECOND: 'MICROSECOND' | 'MICROSECONDS';
+NANOSECOND: 'NANOSECOND' | 'NANOSECONDS';
--- End diff --

We do not support `nanosecond`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-03-05 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r172427354
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -790,6 +796,16 @@ ASC: 'ASC';
 DESC: 'DESC';
 FOR: 'FOR';
 INTERVAL: 'INTERVAL';
+YEAR: 'YEAR' | 'YEARS';
+MONTH: 'MONTH' | 'MONTHS';
+WEEK: 'WEEK' | 'WEEKS';
+DAY: 'DAY' | 'DAYS';
+HOUR: 'HOUR' | 'HOURS';
+MINUTE: 'MINUTE' | 'MINUTES';
+SECOND: 'SECOND' | 'SECONDS';
+MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
--- End diff --

yea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-03-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r172426790
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -790,6 +796,16 @@ ASC: 'ASC';
 DESC: 'DESC';
 FOR: 'FOR';
 INTERVAL: 'INTERVAL';
+YEAR: 'YEAR' | 'YEARS';
+MONTH: 'MONTH' | 'MONTHS';
+WEEK: 'WEEK' | 'WEEKS';
+DAY: 'DAY' | 'DAYS';
+HOUR: 'HOUR' | 'HOURS';
+MINUTE: 'MINUTE' | 'MINUTES';
+SECOND: 'SECOND' | 'SECONDS';
+MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
--- End diff --

nvm, it sounds like we already support them. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Support interval values withou...

2018-03-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r172426643
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -790,6 +796,16 @@ ASC: 'ASC';
 DESC: 'DESC';
 FOR: 'FOR';
 INTERVAL: 'INTERVAL';
+YEAR: 'YEAR' | 'YEARS';
+MONTH: 'MONTH' | 'MONTHS';
+WEEK: 'WEEK' | 'WEEKS';
+DAY: 'DAY' | 'DAYS';
+HOUR: 'HOUR' | 'HOURS';
+MINUTE: 'MINUTE' | 'MINUTES';
+SECOND: 'SECOND' | 'SECONDS';
+MILLISECOND: 'MILLISECOND' | 'MILLISECONDS';
--- End diff --

I am wondering which systems support `MILLISECOND `, `MICROSECOND ` and 
`NANOSECOND `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20746
  
**[Test build #87995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87995/testReport)**
 for PR 20746 at commit 
[`62a9814`](https://github.com/apache/spark/commit/62a98147a7a9aeb43e4827e3095577d0be6dee47).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20746: [SPARK-23594][SQL] GetExternalRowField should support in...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20746
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20746: [SPARK-23594][SQL] GetExternalRowField should sup...

2018-03-05 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/20746

[SPARK-23594][SQL] GetExternalRowField should support interpreted execution

## What changes were proposed in this pull request?
This pr added interpreted execution for `GetExternalRowField`.

## How was this patch tested?
Added tests in `ObjectExpressionsSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-23594

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20746.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20746


commit 62a98147a7a9aeb43e4827e3095577d0be6dee47
Author: Takeshi Yamamuro 
Date:   2018-03-06T07:04:30Z

GetExternalRowField should support interpreted execution




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...

2018-03-05 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20699
  
`EnsureRequirements `can eliminates unnecessary shuffles if child has same 
partitioning or compatible child partitionings that same expressions 
distribution.
but when child has different partitioning or different expressions 
distribution. `EnsureRequirements `can't eliminates unnecessary shuffles .
this PR deals with the latter case. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87994/testReport)**
 for PR 19222 at commit 
[`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20716: [SPARK-23566][Minor][Doc] Argument name mismatch fixed

2018-03-05 Thread animenon
Github user animenon commented on the issue:

https://github.com/apache/spark/pull/20716
  
@HyukjinKwon Its minor, so may not be required. Had tagged Gator just for a 
check. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1306/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread kiszk
GitHub user kiszk reopened a pull request:

https://github.com/apache/spark/pull/19222

[SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several 
types of memory block

## What changes were proposed in this pull request?

This PR allows us to use one of several types of `MemoryBlock`, such as 
byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use 
`java.nio.DirectByteBuffer` allows to have off heap memory which is 
automatically deallocated by JVM. `MemoryBlock`  class has primitive accessors 
like `Platform.getInt()`, `Platform.putint()`, or `Platform.copyMemory()`. 

This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and 
other places. This PR can improve performance of operations involving memory 
accesses (e.g. `UTF8String.trim`) by 1.8x.

For now, this PR does not use `MemoryBlock` for `BufferHolder` based on 
@cloud-fan's 
[suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290).

Since this PR is a successor of #11494, close #11494. Many codes were 
ported from #11494. Many efforts were put here. **I think this PR should credit 
to @yzotov.**


This PR can achieve **1.1-1.4x performance improvements** for  operations 
in `UTF8String` or `Murmur3_x86_32`. Other operations are almost comparable 
performances.

Without this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
Hash byte arrays with length 268435487:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


Murmur3_x86_32 526 /  536  0.0   
131399881.5   1.0X

UTF8String benchmark:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


hashCode   525 /  552   1022.6  
 1.0   1.0X
substring  414 /  423   1298.0  
 0.8   1.3X
```

With this PR
```
OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 
4.4.0-22-generic
Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
Hash byte arrays with length 268435487:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


Murmur3_x86_32 474 /  488  0.0   
118552232.0   1.0X

UTF8String benchmark:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


hashCode   476 /  480   1127.3  
 0.9   1.0X
substring  287 /  291   1869.9  
 0.5   1.7X
```

Benchmark program
```
test("benchmark Murmur3_x86_32") {
  val length = 8192 * 32768 + 31
  val seed = 42L
  val iters = 1 << 2
  val random = new Random(seed)
  val arrays = Array.fill[MemoryBlock](numArrays) {
val bytes = new Array[Byte](length)
random.nextBytes(bytes)
new ByteArrayMemoryBlock(bytes, Platform.BYTE_ARRAY_OFFSET, length)
  }

  val benchmark = new Benchmark("Hash byte arrays with length " + length,
iters * numArrays, minNumIters = 20) 
  benchmark.addCase("HiveHasher") { _: Int =>
var sum = 0L
for (_ <- 0L until iters) {
  sum += HiveHasher.hashUnsafeBytesBlock(
arrays(i), Platform.BYTE_ARRAY_OFFSET, length)
}
  }
  benchmark.run()   
}

test("benchmark UTF8String") {
  val N = 512 * 1024 * 1024
  val iters = 2
  val benchmark = new Benchmark("UTF8String benchmark", N, minNumIters = 20)
  val str0 = new java.io.StringWriter() { { for (i <- 0 until N) { write(" 
") } } }.toString
  val s0 = UTF8String.fromString(str0)
  benchmark.addCase("hashCode") { _: Int =>
var h: Int = 0
for (_ <- 0L until iters) { h += s0.hashCode }
  }
  benchmark.addCase("substring") { _: Int =>
var s: UTF8String = null
for (_ <- 0L until iters) { s = s0.substring(N / 2 - 5, N / 2 + 5) }
  }
  benchmark.run()
}
```

  
  
I run [this benchmark 
program](https://gist.github.com/kiszk/94f75b506c93a663bbbc372ffe8f05de) using 
[the 

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread kiszk
Github user kiszk closed the pull request at:

https://github.com/apache/spark/pull/19222


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19222
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172421399
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -57,20 +59,20 @@
 
   // The data stored in these two allocations need to maintain binary 
compatible. We can
   // directly pass this buffer to external components.
-  private long nulls;
--- End diff --

I see. `Platform.reallocateMemory` does not exist.
`MemoryAllocator.UNSAFE.reallocate()` returns `MemoryBlock`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20633: [SPARK-23455][ML] Default Params in ML should be ...

2018-03-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20633#discussion_r172420910
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -351,17 +359,21 @@ private[ml] object DefaultParamsReader {
   timestamp: Long,
   sparkVersion: String,
   params: JValue,
+  defaultParams: JValue,
   metadata: JValue,
   metadataJson: String) {
 
 /**
  * Get the JSON value of the [[org.apache.spark.ml.param.Param]] of 
the given name.
  * This can be useful for getting a Param value before an instance of 
`Params`
  * is available.
+ *
+ * @param isDefaultParam Whether the given param name is a default 
param. Default is false.
  */
-def getParamValue(paramName: String): JValue = {
+def getParamValue(paramName: String, isDefaultParam: Boolean = false): 
JValue = {
--- End diff --

Sounds good. I will change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-03-05 Thread lucio-yz
Github user lucio-yz commented on the issue:

https://github.com/apache/spark/pull/20472
  
@srowen Any other problems?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87989/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87989/testReport)**
 for PR 19222 at commit 
[`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20633
  
@WeichenXu123 I've added unit test in 
`DefaultReadWriteSuite/DefaultReadWriteTest` to test if this can read old 
metadata back.

Sounds like the backward compatibility test you suggested should be checked 
manually. I will test it. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-03-05 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r172416019
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C](
 
context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
 context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
 
context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+// Use completion callback to stop sorter if task was 
finished/cancelled.
+context.addTaskCompletionListener(_ => {
+  sorter.stop()
+})
 CompletionIterator[Product2[K, C], Iterator[Product2[K, 
C]]](sorter.iterator, sorter.stop())
   case None =>
 aggregatedIter
 }
+// Use another interruptible iterator here to support task 
cancellation as aggregator or(and)
+// sorter may have consumed previous interruptible iterator.
+new InterruptibleIterator[Product2[K, C]](context, resultIter)
--- End diff --

Will do 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-03-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20345
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r172415192
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala ---
@@ -17,94 +17,72 @@
 
 package org.apache.spark.ml.feature
 
-import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
-import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest}
 import org.apache.spark.ml.util.TestingUtils._
-import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
 
 
-class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext 
with DefaultReadWriteTest {
+class NormalizerSuite extends MLTest with DefaultReadWriteTest {
 
   import testImplicits._
 
-  @transient var data: Array[Vector] = _
-  @transient var dataFrame: DataFrame = _
-  @transient var normalizer: Normalizer = _
-  @transient var l1Normalized: Array[Vector] = _
-  @transient var l2Normalized: Array[Vector] = _
+  @transient val data: Seq[Vector] = Seq(
+Vectors.sparse(3, Seq((0, -2.0), (1, 2.3))),
+Vectors.dense(0.0, 0.0, 0.0),
+Vectors.dense(0.6, -1.1, -3.0),
+Vectors.sparse(3, Seq((1, 0.91), (2, 3.2))),
+Vectors.sparse(3, Seq((0, 5.7), (1, 0.72), (2, 2.7))),
+Vectors.sparse(3, Seq()))
--- End diff --

ok its a minor issue lets ignore it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r172414393
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -104,9 +104,16 @@ private[spark] class BlockStoreShuffleReader[K, C](
 
context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
 context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
 
context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
+// Use completion callback to stop sorter if task was 
finished/cancelled.
+context.addTaskCompletionListener(_ => {
+  sorter.stop()
+})
 CompletionIterator[Product2[K, C], Iterator[Product2[K, 
C]]](sorter.iterator, sorter.stop())
   case None =>
 aggregatedIter
 }
+// Use another interruptible iterator here to support task 
cancellation as aggregator or(and)
+// sorter may have consumed previous interruptible iterator.
+new InterruptibleIterator[Product2[K, C]](context, resultIter)
--- End diff --

there is a chance that `resultIter` is already an `InterruptibleIterator`, 
and we should not double wrap it. Can you send a followup PR to fix this? then 
we can backport them to 2.3 together.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20742
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2018-03-05 Thread JohnHBrock
Github user JohnHBrock commented on the issue:

https://github.com/apache/spark/pull/18610
  
What else needs to be done before this can be merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87987/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20464
  
**[Test build #87993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87993/testReport)**
 for PR 20464 at commit 
[`0ebdf74`](https://github.com/apache/spark/commit/0ebdf74942e0894bfaf6cbede4c03fd3f5d26411).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87993/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20742
  
**[Test build #87987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87987/testReport)**
 for PR 20742 at commit 
[`c867373`](https://github.com/apache/spark/commit/c867373867b88cce4eed8a69bdf05585f7142dc1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172413460
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -57,20 +59,20 @@
 
   // The data stored in these two allocations need to maintain binary 
compatible. We can
   // directly pass this buffer to external components.
-  private long nulls;
--- End diff --

To remove `Platform.reallocateMemory` is not a strong reason to migrate 
`OffHeapColumnVectot` to memory block, we can do it later, and update 
`OnHeapColumnVector` too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87991/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172412911
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -57,20 +59,20 @@
 
   // The data stored in these two allocations need to maintain binary 
compatible. We can
   // directly pass this buffer to external components.
-  private long nulls;
--- End diff --

Ah, you want to allocate memory for `OffHeapColumnVector` by using 
`Platform` instead of `MemoryBlock`?
Could you please explain why  `OffHeapColumnVector`  wants to allocate 
memory `Platform`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #87991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87991/testReport)**
 for PR 19381 at commit 
[`1420867`](https://github.com/apache/spark/commit/1420867e43e32f46e18dccf61720228a5b8f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-03-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20659
  
Nice try! Could you fix the remaining failure?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-03-05 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
@cloud-fan is it possible that we also merge this into branch-2.3, so this 
fix could be released in the Spark-2.3.1?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1305/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172410077
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -57,20 +59,20 @@
 
   // The data stored in these two allocations need to maintain binary 
compatible. We can
   // directly pass this buffer to external components.
-  private long nulls;
--- End diff --

what if we don't remove `Platform.reallocateMemory`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20464
  
**[Test build #87993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87993/testReport)**
 for PR 20464 at commit 
[`0ebdf74`](https://github.com/apache/spark/commit/0ebdf74942e0894bfaf6cbede4c03fd3f5d26411).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-03-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r172409837
  
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
   function(x, start, stop) {
-jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
--- End diff --

Added to the func doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...

2018-03-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20742#discussion_r172409490
  
--- Diff: R/pkg/DESCRIPTION ---
@@ -57,6 +57,6 @@ Collate:
 'types.R'
 'utils.R'
 'window.R'
-RoxygenNote: 5.0.1
+RoxygenNote: 6.0.1
--- End diff --

pls revert this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20699: [SPARK-23544][SQL]Remove redundancy ShuffleExchange in t...

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20699
  
Sorry I should make the question more specific: `EnsureRequirement#apply` 
has a hack to eliminate unnecessary shuffles, do we still need that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r172408255
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), 
("male", "baz", 5))
 .toDF("id", "a", "b")
 val model = formula.fit(original)
+val attr = NominalAttribute.defaultAttr
 val expected = Seq(
 ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0),
 ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0),
 ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0),
 ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0)
 ).toDF("id", "a", "b", "features", "label")
-// assert(result.schema.toString == resultSchema.toString)
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

I am also confused about the align rule. @jkbradley what do you think ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-05 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r172408009
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala 
---
@@ -324,19 +352,46 @@ class QuantileDiscretizerSuite
   .setStages(Array(discretizerForCol1, discretizerForCol2, 
discretizerForCol3))
   .fit(df)
 
-val resultForMultiCols = plForMultiCols.transform(df)
-  .select("result1", "result2", "result3")
-  .collect()
-
-val resultForSingleCol = plForSingleCol.transform(df)
-  .select("result1", "result2", "result3")
-  .collect()
+val expected = Seq(
+  (0.0, 0.0, 0.0),
+  (0.0, 0.0, 1.0),
+  (0.0, 0.0, 1.0),
+  (0.0, 1.0, 2.0),
+  (0.0, 1.0, 2.0),
+  (0.0, 1.0, 2.0),
+  (0.0, 1.0, 3.0),
+  (0.0, 2.0, 4.0),
+  (0.0, 2.0, 4.0),
+  (1.0, 2.0, 5.0),
+  (1.0, 2.0, 5.0),
+  (1.0, 2.0, 5.0),
+  (1.0, 3.0, 6.0),
+  (1.0, 3.0, 6.0),
+  (1.0, 3.0, 7.0),
+  (1.0, 4.0, 8.0),
+  (1.0, 4.0, 8.0),
+  (1.0, 4.0, 9.0),
+  (1.0, 4.0, 9.0),
+  (1.0, 4.0, 9.0)
+  ).toDF("result1", "result2", "result3")
+.collect().toSeq
--- End diff --

But I prefer to avoid hardcoding big literal array so that the code is 
easier for maintenance. and following code is enough I think:
```
val expected = plForSingleCol.transform(df).select("result1", "result2", 
"result3").collect()
testTransformerByGlobalCheckFunc[(Double, Double, Double)](
  df,plForSingleCol,
  "result1", "result2","result3") { 
  rows =>assert(rows == expected)
   }
```
There is a similar case here 
https://github.com/apache/spark/pull/20121#discussion_r172288890


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-03-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20647


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-03-05 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20295
  
@icexelloss Could you annotate `[SQL][PYTHON]` in the pr title please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20647
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-03-05 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/16006
  
@omuravskiy can you comment on
https://github.com/apache/spark/pull/19431
since it appears to be based on your PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20745
  
**[Test build #87992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87992/testReport)**
 for PR 20745 at commit 
[`55aa8bc`](https://github.com/apache/spark/commit/55aa8bca96b112a33cabb352afb4168c2d8f355c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20464: [SPARK-23291][SQL][R] R's substr should not reduc...

2018-03-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20464#discussion_r172406939
  
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
   function(x, start, stop) {
-jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
--- End diff --

I think you mean 1-based,


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20745
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-05 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request:

https://github.com/apache/spark/pull/20745

[SPARK-23288][SS] Fix output metrics with parquet sink

## What changes were proposed in this pull request?

Output metrics were not filled when parquet sink used.

This PR fixes this problem by passing a `BasicWriteJobStatsTracker` in 
`FileStreamSink`.

## How was this patch tested?

Additional unit test added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborgsomogyi/spark SPARK-23288

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20745


commit 22e6ca1576bdeee2092afc8bc82a743e0700a959
Author: Gabor Somogyi 
Date:   2018-02-19T23:43:46Z

[SPARK-23288][SS] Fix output metrics with parquet sink

commit 55aa8bca96b112a33cabb352afb4168c2d8f355c
Author: Gabor Somogyi 
Date:   2018-02-28T22:50:47Z

Merge branch 'master' into SPARK-23288




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/20639
  
God, seems like stuck somehow. I'll re-create the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20639: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-05 Thread gaborgsomogyi
Github user gaborgsomogyi closed the pull request at:

https://github.com/apache/spark/pull/20639


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #87991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87991/testReport)**
 for PR 19381 at commit 
[`1420867`](https://github.com/apache/spark/commit/1420867e43e32f46e18dccf61720228a5b8f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1304/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...

2018-03-05 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20295
  
LGTM except for @BryanCutler's suggestion 
(https://github.com/apache/spark/pull/20295#discussion_r172374978). Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20743
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87985/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20743: [SPARK-23020][CORE][branch-2.3] Fix another race in the ...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20743
  
**[Test build #87985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87985/testReport)**
 for PR 20743 at commit 
[`06aa292`](https://github.com/apache/spark/commit/06aa292c15e61170b91f622dbce54a8149c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19381
  
**[Test build #87990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87990/testReport)**
 for PR 19381 at commit 
[`ab68214`](https://github.com/apache/spark/commit/ab68214028979b431de4fe605a843e4a0cb013db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19381
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87984/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20706
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-03-05 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/16006
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20706
  
**[Test build #87984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87984/testReport)**
 for PR 20706 at commit 
[`427a977`](https://github.com/apache/spark/commit/427a977b33c6c3f2e436b43ac9f9263c64f835bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20726: [SPARK-23574][CORE] Report SinglePartition in DataSource...

2018-03-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20726
  
Btw, I think the title should be `[SQL]` instead of `[CORE]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20726: [SPARK-23574][CORE] Report SinglePartition in DataSource...

2018-03-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20726
  
LGTM with one trivial doc point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20726: [SPARK-23574][CORE] Report SinglePartition in Dat...

2018-03-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20726#discussion_r172403479
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java
 ---
@@ -23,6 +23,10 @@
 /**
  * A mix in interface for {@link DataSourceReader}. Data source readers 
can implement this
  * interface to report data partitioning and try to avoid shuffle at Spark 
side.
+ *
+ * Note that Spark will always infer a
+ * {@link org.apache.spark.sql.catalyst.plans.physical.SinglePartition} 
partitioning when the
+ * reader creates exactly 1 {@link DataReaderFactory}.
--- End diff --

nit:  no matter the reader implements this interface or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20649
  
I usually leave open it for few more days in case other reviewers have some 
more review comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...

2018-03-05 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20702
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87983/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20657
  
**[Test build #87983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87983/testReport)**
 for PR 20657 at commit 
[`3294596`](https://github.com/apache/spark/commit/329459652fb40eb82b81ef66ad93cec05b9dd016).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87989 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87989/testReport)**
 for PR 19222 at commit 
[`a62770b`](https://github.com/apache/spark/commit/a62770bdcd2cd83dc19d5f39a55b5186201ddc34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1302/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-05 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r172395871
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java
 ---
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a long array on Java heap.
+ */
+public final class OnHeapMemoryBlock extends MemoryBlock {
--- End diff --

👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16770: [SPARK-15009][PYTHON][ML] Construct CountVectorizerModel...

2018-03-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16770
  
**[Test build #87988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87988/testReport)**
 for PR 16770 at commit 
[`8860641`](https://github.com/apache/spark/commit/8860641487411d23cd86e932f0c50d06ecee626c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16770: [SPARK-15009][PYTHON][ML] Construct CountVectorizerModel...

2018-03-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87988/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >