date:20180815

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22118: Branch 2.2

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22118
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22118: Branch 2.2

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22118
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22118: Branch 2.2

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22118
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94838/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22118: Branch 2.2

2018-08-15 Thread speful

GitHub user speful opened a pull request:

https://github.com/apache/spark/pull/22118

Branch 2.2

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22118


commit 86609a95af4b700e83638b7416c7e3706c2d64c6
Author: Liang-Chi Hsieh 
Date:   2017-08-08T08:12:41Z

[SPARK-21567][SQL] Dataset should work with type alias

If we create a type alias for a type workable with Dataset, the type alias 
doesn't work with Dataset.

A reproducible case looks like:

object C {
  type TwoInt = (Int, Int)
  def tupleTypeAlias: TwoInt = (1, 1)
}

Seq(1).toDS().map(_ => ("", C.tupleTypeAlias))

It throws an exception like:

type T1 is not a class
scala.ScalaReflectionException: type T1 is not a class
  at 
scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
  ...

This patch accesses the dealias of type in many places in `ScalaReflection` 
to fix it.

Added test case.

Author: Liang-Chi Hsieh 

Closes #18813 from viirya/SPARK-21567.

(cherry picked from commit ee1304199bcd9c1d5fc94f5b06fdd5f6fe7336a1)
Signed-off-by: Wenchen Fan 

commit e87ffcaa3e5b75f8d313dc995e4801063b60cd5c
Author: Wenchen Fan 
Date:   2017-08-08T08:32:49Z

Revert "[SPARK-21567][SQL] Dataset should work with type alias"

This reverts commit 86609a95af4b700e83638b7416c7e3706c2d64c6.

commit d0233145208eb6afcd9fe0c1c3a9dbbd35d7727e
Author: pgandhi 
Date:   2017-08-09T05:46:06Z

[SPARK-21503][UI] Spark UI shows incorrect task status for a killed 
Executor Process

The executor tab on Spark UI page shows task as completed when an executor 
process that is running that task is killed using the kill command.
Added the case ExecutorLostFailure which was previously not there, thus, 
the default case would be executed in which case, task would be marked as 
completed. This case will consider all those cases where executor connection to 
Spark Driver was lost due to killing the executor process, network connection 
etc.

## How was this patch tested?
Manually Tested the fix by observing the UI change before and after.
Before:
https://user-images.githubusercontent.com/8190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png;>
After:
https://user-images.githubusercontent.com/8190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png;>

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: pgandhi 
Author: pgandhi999 

Closes #18707 from pgandhi999/master.

(cherry picked from commit f016f5c8f6c6aae674e9905a5c0b0bede09163a4)
Signed-off-by: Wenchen Fan 

commit 7446be3328ea75a5197b2587e3a8e2ca7977726b
Author: WeichenXu 
Date:   2017-08-09T06:44:10Z

[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong 
wolfe line search

## What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
https://github.com/scalanlp/breeze/pull/651

## How was this patch tested?

N/A

Author: WeichenXu 

Closes #18797 from WeichenXu123/update-breeze.

(cherry picked from commit b35660dd0e930f4b484a079d9e2516b0a7dacf1d)
Signed-off-by: Yanbo Liang 

commit f6d56d2f1c377000921effea2b1faae15f9cae82
Author: Shixiong Zhu 
Date:   2017-08-09T06:49:33Z

[SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the 
return value

Same PR as #18799 but for branch 2.2. Main discussion the other PR.


When I was investigating a flaky test, I realized that many places don't 
check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When 
a batch is supposed to be there, the caller just ignores None rather than 
throwing an error. If some bug causes a query doesn't generate a batch metadata 
file, this behavior will hide it and allow the query continuing to run and 
finally delete metadata logs and make it hard to debug.

This PR ensures that places calling HDFSMetadataLog.get always check the

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20725
  
**[Test build #94838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94838/testReport)**
 for PR 20725 at commit 
[`461c326`](https://github.com/apache/spark/commit/461c326f00d68a350a1b5c0f7b644f2871ee0a85).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22117
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94834/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22117
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22117
  
**[Test build #94834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94834/testReport)**
 for PR 22117 at commit 
[`3cad78f`](https://github.com/apache/spark/commit/3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20725
  
This is working now, and BinaryType support is conditional on pyarrow 
0.10.0 or higher being used. @HyukjinKwon @cloud-fan what are your thoughts on 
getting this in for Spark 2.4? I think it would be very useful to have since 
images in Spark use the BInaryType and it will be good to have when integrating 
Spark with DL frameworks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20725
  
**[Test build #94838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94838/testReport)**
 for PR 20725 at commit 
[`461c326`](https://github.com/apache/spark/commit/461c326f00d68a350a1b5c0f7b644f2871ee0a85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21537
  
We are fully swamped by the hotfix and regressions of 2.3 release and the 
new features that are targeting to 2.4. We should post some comments in this PR 
earlier. 

Designing an IR for our codegen is the right thing we should do. [If you do 
not agree on this, we can discuss about it.] How to design an IR is a 
challenging task. The whole community is welcome to submit the designs and PRs. 
Everyone can show the ideas. The best idea will win. @HyukjinKwon If you have a 
bandwidth, please also have a try


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2236/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94835/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #94835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94835/testReport)**
 for PR 21860 at commit 
[`6ff46d9`](https://github.com/apache/spark/commit/6ff46d941a6ddb29345ea0c563aa68b77f540139).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-15 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21912
  
cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94833/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22031
  
**[Test build #94833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94833/testReport)**
 for PR 22031 at commit 
[`0342ed9`](https://github.com/apache/spark/commit/0342ed934e65c13c43081f464503800118383a44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files

2018-08-15 Thread raofu

Github user raofu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22113#discussion_r210473687
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging {
   hdfsPath.getFileSystem(conf)
 }
 
-listOrcFiles(basePath, conf).iterator.map { path =>
+listOrcFiles(basePath, conf).view.map { path =>
--- End diff --

My bad. I misread the code. Sorry about the noise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files

2018-08-15 Thread raofu

Github user raofu closed the pull request at:

https://github.com/apache/spark/pull/22113


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22115
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94831/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22115
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22115
  
**[Test build #94831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94831/testReport)**
 for PR 22115 at commit 
[`089c31f`](https://github.com/apache/spark/commit/089c31fcff1a5b84634f5de78c1bd440f738b2f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210469510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,53 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 
1);
+map(array(1, 2, 3), array(2, 3, 4))
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+map(array(1, 2, 3), array(2, 4, 6))
+  """,
+since = "2.4.0")
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210469494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,53 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210471011
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,177 @@ class DataFrameFunctionsSuite extends QueryTest 
with SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform values function - test primitive data types") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Boolean, String](false -> "abc", true -> "def")
+).toDF("x")
+
+val dfExample3 = Seq(
+  Map[String, Int]("a" -> 1, "b" -> 2, "c" -> 3)
+).toDF("y")
+
+val dfExample4 = Seq(
+  Map[Int, Double](1 -> 1.0, 2 -> 1.40, 3 -> 1.70)
+).toDF("z")
+
+val dfExample5 = Seq(
+  Map[Int, Array[Int]](1 -> Array(1, 2))
+).toDF("c")
+
+def testMapOfPrimitiveTypesCombination(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> k + 
v)"),
+Seq(Row(Map(1 -> 2, 9 -> 18, 8 -> 16, 7 -> 14
+
+  checkAnswer(dfExample2.selectExpr(
+"transform_values(x, (k, v) -> if(k, v, CAST(k AS String)))"),
+Seq(Row(Map(false -> "false", true -> "def"
+
+  checkAnswer(dfExample2.selectExpr("transform_values(x, (k, v) -> NOT 
k AND v = 'abc')"),
+Seq(Row(Map(false -> true, true -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_values(y, (k, v) -> v * 
v)"),
+Seq(Row(Map("a" -> 1, "b" -> 4, "c" -> 9
+
+  checkAnswer(dfExample3.selectExpr(
+"transform_values(y, (k, v) -> k || ':' || CAST(v as String))"),
+Seq(Row(Map("a" -> "a:1", "b" -> "b:2", "c" -> "c:3"
+
+  checkAnswer(
+dfExample3.selectExpr("transform_values(y, (k, v) -> concat(k, 
cast(v as String)))"),
+Seq(Row(Map("a" -> "a1", "b" -> "b2", "c" -> "c3"
+
+  checkAnswer(
+dfExample4.selectExpr(
+  "transform_values(" +
+"z,(k, v) -> map_from_arrays(ARRAY(1, 2, 3), " +
+"ARRAY('one', 'two', 'three'))[k] || '_' || CAST(v AS 
String))"),
+Seq(Row(Map(1 -> "one_1.0", 2 -> "two_1.4", 3 ->"three_1.7"
+
+  checkAnswer(
+dfExample4.selectExpr("transform_values(z, (k, v) -> k-v)"),
+Seq(Row(Map(1 -> 0.0, 2 -> 0.6001, 3 -> 1.3
+
+  checkAnswer(
+dfExample5.selectExpr("transform_values(c, (k, v) -> k + 
cardinality(v))"),
+Seq(Row(Map(1 -> 3
+}
+
+// Test with local relation, the Project will be evaluated without 
codegen
+testMapOfPrimitiveTypesCombination()
+dfExample1.cache()
+dfExample2.cache()
+dfExample3.cache()
+dfExample4.cache()
+dfExample5.cache()
+// Test with cached relation, the Project will be evaluated with 
codegen
+testMapOfPrimitiveTypesCombination()
+  }
+
+  test("transform values function - test empty") {
+val dfExample1 = Seq(
+  Map.empty[Integer, Integer]
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map.empty[BigInt, String]
+).toDF("j")
+
+def testEmpty(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
NULL)"),
+Seq(Row(Map.empty[Integer, Integer])))
+
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
k)"),
+Seq(Row(Map.empty[Integer, Integer])))
+
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
v)"),
+Seq(Row(Map.empty[Integer, Integer])))
+
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
0)"),
+Seq(Row(Map.empty[Integer, Integer])))
+
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
'value')"),
+Seq(Row(Map.empty[Integer, String])))
+
+  checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 
true)"),
+Seq(Row(Map.empty[Integer, Boolean])))
+
+  checkAnswer(dfExample2.selectExpr("transform_values(j, (k, v) -> k + 
cast(v as BIGINT))"),
+Seq(Row(Map.empty[BigInt, BigInt])))
+}
+
+testEmpty()
+dfExample1.cache()
+dfExample2.cache()
+testEmpty()
+  }
+
+  test("transform values function - test null values") {
+val dfExample1 = Seq(
+  Map[Int, Integer](1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
+).toDF("a")
+
+val dfExample2 = Seq(
+  Map[Int, String](1 -> "a", 2 -> "b", 3 -> null)

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210469472
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,53 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210470513
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,53 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 
1);
+map(array(1, 2, 3), array(2, 3, 4))
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+map(array(1, 2, 3), array(2, 4, 6))
+  """,
+since = "2.4.0")
+case class TransformValues(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  @transient lazy val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
+
+  override def dataType: DataType = MapType(keyType, function.dataType, 
valueContainsNull)
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction)
+  : TransformValues = {
+copy(function = f(function, (keyType, false) :: (valueType, 
valueContainsNull) :: Nil))
+  }
+
+  @transient lazy val LambdaFunction(
+  _, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: 
Nil, _) = function
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread deshanxiao

Github user deshanxiao commented on the issue:

https://github.com/apache/spark/pull/22109
  
@vanzin Sorry..SPARK-22850 has fix the problem. Maybe I will track the 
executor lose problem next. Thank you!  @vanzin @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210467354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,91 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(left, right, func) - Merges the two given arrays, 
element-wise, into a single array using function. If one array is shorter, 
nulls are appended at the end to match the length of the longer array, before 
applying function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, 
x));
+   array(('a', 1), ('b', 3), ('c', 5))
+  > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y));
+   array(4, 6)
+  > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) 
-> concat(x, y));
+   array('ad', 'be', 'cf')
+  """,
+  since = "2.4.0")
+// scalastyle:on line.size.limit
+case class ArraysZipWith(
+left: Expression,
+right: Expression,
+function: Expression)
+  extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes {
+
+  override def inputs: Seq[Expression] = List(left, right)
+
+  override def functions: Seq[Expression] = List(function)
+
+  def expectingFunctionType: AbstractDataType = AnyDataType
+  @transient lazy val functionForEval: Expression = functionsForEval.head
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType, expectingFunctionType)
+
+  override def nullable: Boolean = inputs.exists(_.nullable)
+
+  override def dataType: ArrayType = ArrayType(function.dataType, 
function.nullable)
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): ArraysZipWith = {
+val (leftElementType, leftContainsNull) = left.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+val (rightElementType, rightContainsNull) = right.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+copy(function = f(function,
+  (leftElementType, leftContainsNull) :: (rightElementType, 
rightContainsNull) :: Nil))
--- End diff --

If we append `null`s to the shorter array, both of the arguments might be 
`null`, so we should use `true` for nullabilities of the arguments as @mn-mikke 
suggested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210468640
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,76 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("arrays zip_with function - for primitive types") {
+val df1 = Seq[(Seq[Integer], Seq[Integer])](
+  (Seq(9001, 9002, 9003), Seq(4, 5, 6)),
+  (Seq(1, 2), Seq(3, 4)),
+  (Seq.empty, Seq.empty),
+  (null, null)
+).toDF("val1", "val2")
+val df2 = Seq[(Seq[Integer], Seq[Long])](
+  (Seq(1, null, 3), Seq(1L, 2L)),
+  (Seq(1, 2, 3), Seq(4L, 11L))
+).toDF("val1", "val2")
+
+val expectedValue1 = Seq(
+  Row(Seq(9005, 9007, 9009)),
+  Row(Seq(4, 6)),
+  Row(Seq.empty),
+  Row(null))
+checkAnswer(df1.selectExpr("zip_with(val1, val2, (x, y) -> x + y)"), 
expectedValue1)
+
+val expectedValue2 = Seq(
+  Row(Seq(Row(1.0, 1), Row(2.0, null), Row(null, 3))),
--- End diff --

Why `1.0` or `2.0` instead of `1L` or `2L`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210467721
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   map_zip_with(mbb0, mbbn, concat),
   null)
   }
+
+  test("ZipWith") {
+def zip_with(
+left: Expression,
+right: Expression,
+f: (Expression, Expression) => Expression): Expression = {
+  val ArrayType(leftT, leftContainsNull) = 
left.dataType.asInstanceOf[ArrayType]
+  val ArrayType(rightT, rightContainsNull) = 
right.dataType.asInstanceOf[ArrayType]
+  ZipWith(left, right, createLambda(leftT, leftContainsNull, rightT, 
rightContainsNull, f))
+}
+
+val ai0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, 
containsNull = false))
+val ai1 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, 
containsNull = false))
--- End diff --

What's the difference between `ai0` and `ai1`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210467959
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   map_zip_with(mbb0, mbbn, concat),
   null)
   }
+
+  test("ZipWith") {
+def zip_with(
+left: Expression,
+right: Expression,
+f: (Expression, Expression) => Expression): Expression = {
+  val ArrayType(leftT, leftContainsNull) = 
left.dataType.asInstanceOf[ArrayType]
+  val ArrayType(rightT, rightContainsNull) = 
right.dataType.asInstanceOf[ArrayType]
+  ZipWith(left, right, createLambda(leftT, leftContainsNull, rightT, 
rightContainsNull, f))
+}
+
+val ai0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, 
containsNull = false))
+val ai1 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, 
containsNull = false))
+val ai2 = Literal.create(Seq[Integer](1, null, 3), 
ArrayType(IntegerType, containsNull = true))
+val ai3 = Literal.create(Seq[Integer](1, null), ArrayType(IntegerType, 
containsNull = true))
+val ain = Literal.create(null, ArrayType(IntegerType, containsNull = 
false))
+
+val add: (Expression, Expression) => Expression = (x, y) => x + y
+val plusOne: Expression => Expression = x => x + 1
+
+checkEvaluation(zip_with(ai0, ai1, add), Seq(2, 4, 6))
+checkEvaluation(zip_with(ai3, ai2, add), Seq(2, null, null))
+checkEvaluation(zip_with(ai2, ai3, add), Seq(2, null, null))
+checkEvaluation(zip_with(ain, ain, add), null)
+checkEvaluation(zip_with(ai1, ain, add), null)
+checkEvaluation(zip_with(ain, ai1, add), null)
+
+val as0 = Literal.create(Seq("a", "b", "c"), ArrayType(StringType, 
containsNull = false))
+val as1 = Literal.create(Seq("a", null, "c"), ArrayType(StringType, 
containsNull = true))
+val as2 = Literal.create(Seq("a"), ArrayType(StringType, containsNull 
= true))
+val asn = Literal.create(null, ArrayType(StringType, containsNull = 
false))
+
+val concat: (Expression, Expression) => Expression = (x, y) => 
Concat(Seq(x, y))
+
+checkEvaluation(zip_with(as0, as1, concat), Seq("aa", null, "cc"))
+checkEvaluation(zip_with(as0, as2, concat), Seq("aa", null, null))
+
+val aai1 = Literal.create(Seq(Seq(1, 2, 3), null, Seq(4, 5)),
+  ArrayType(ArrayType(IntegerType, containsNull = false), containsNull 
= true))
+val aai2 = Literal.create(Seq(Seq(1, 2, 3)),
+  ArrayType(ArrayType(IntegerType, containsNull = false), containsNull 
= true))
+checkEvaluation(
+  zip_with(aai1, aai2, (a1, a2) =>
+  Cast(zip_with(transform(a1, plusOne), transform(a2, plusOne), 
add), StringType)),
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210468814
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql ---
@@ -51,3 +51,12 @@ select exists(ys, y -> y > 30) as v from nested;
 
 -- Check for element existence in a null array
 select exists(cast(null as array), y -> y > 30) as v;
+
+-- Zip with array
+select zip_with(ys, zs, (a, b) -> a + size(b)) as v from nested;
+
+-- Zip with array with concat
+select zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> 
concat(x, y)) as v;
+
+-- Zip with array coalesce
+select zip_with(array('a'), array('d', null, 'f'), (x, y) -> coalesce(x, 
y)) as v;
--- End diff --

Can you add a line break at the end of the file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210466854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -687,3 +687,89 @@ case class MapZipWith(left: Expression, right: 
Expression, function: Expression)
 
   override def prettyName: String = "map_zip_with"
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(left, right, func) - Merges the two given arrays, 
element-wise, into a single array using function. If one array is shorter, 
nulls are appended at the end to match the length of the longer array, before 
applying function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, 
x));
+   array(('a', 1), ('b', 3), ('c', 5))
+  > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y));
+   array(4, 6)
+  > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) 
-> concat(x, y));
+   array('ad', 'be', 'cf')
+  """,
+  since = "2.4.0")
+// scalastyle:on line.size.limit
+case class ZipWith(left: Expression, right: Expression, function: 
Expression)
+  extends HigherOrderFunction with CodegenFallback {
+
+  def functionForEval: Expression = functionsForEval.head
+
+  override def arguments: Seq[Expression] = left :: right :: Nil
+
+  override def argumentTypes: Seq[AbstractDataType] = ArrayType :: 
ArrayType :: Nil
+
+  override def functions: Seq[Expression] = List(function)
+
+  override def functionTypes: Seq[AbstractDataType] = AnyDataType :: Nil
+
+  override def nullable: Boolean = left.nullable || right.nullable
+
+  override def dataType: ArrayType = ArrayType(function.dataType, 
function.nullable)
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): ZipWith = {
+val (leftElementType, leftContainsNull) = left.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+val (rightElementType, rightContainsNull) = right.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
--- End diff --

Now we can do:

```scala
val ArrayType(leftElementType, leftContainsNull) = left.dataType
val ArrayType(rightElementType, rightContainsNull) = right.dataType
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210467535
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   map_zip_with(mbb0, mbbn, concat),
   null)
   }
+
+  test("ZipWith") {
+def zip_with(
+left: Expression,
+right: Expression,
+f: (Expression, Expression) => Expression): Expression = {
+  val ArrayType(leftT, leftContainsNull) = 
left.dataType.asInstanceOf[ArrayType]
+  val ArrayType(rightT, rightContainsNull) = 
right.dataType.asInstanceOf[ArrayType]
--- End diff --

nit: we don't need `.asInstanceOf[ArrayType]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

2018-08-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21561#discussion_r210468884
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala ---
@@ -246,6 +245,16 @@ class BisectingKMeans private (
 new BisectingKMeansModel(root, this.distanceMeasure)
   }
 
+  /**
+   * Runs the bisecting k-means algorithm.
+   * @param input RDD of vectors
+   * @return model for the bisecting kmeans
+   */
+  @Since("1.6.0")
--- End diff --

Oh right I get it now, this isn't a new method, it's 'replacing' the 
definition above. ð 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: [DOCS]Update configuration.md

2018-08-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22116
  
Huh OK I thought I looked and this had been fixed. Good catch. Also there's 
an instance in `cloud-integration.md`, worth fixing too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

2018-08-15 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21561#discussion_r210468639
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala ---
@@ -246,6 +245,16 @@ class BisectingKMeans private (
 new BisectingKMeansModel(root, this.distanceMeasure)
   }
 
+  /**
+   * Runs the bisecting k-means algorithm.
+   * @param input RDD of vectors
+   * @return model for the bisecting kmeans
+   */
+  @Since("1.6.0")
--- End diff --

`def run(input: RDD[Vector]): BisectingKMeansModel` is a public api since 
1.6, and users can call it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

2018-08-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21561#discussion_r210468107
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala ---
@@ -246,6 +245,16 @@ class BisectingKMeans private (
 new BisectingKMeansModel(root, this.distanceMeasure)
   }
 
+  /**
+   * Runs the bisecting k-means algorithm.
+   * @param input RDD of vectors
+   * @return model for the bisecting kmeans
+   */
+  @Since("1.6.0")
--- End diff --

You couldn't call `BisectingKMeans.run(...)` before this, right? it wasn't 
in a superclass or anything. In that sense I think this method needs to be 
marked as new as of 2.4.0, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-15 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
@tdas Kindly reminder.
@zsxwing Could you take a quick look at this and share your thought? I 
think the patch is ready to merge, but blocked with slightly conflict of view 
so more voices would be better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

2018-08-15 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21561#discussion_r210467653
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala ---
@@ -246,6 +245,16 @@ class BisectingKMeans private (
 new BisectingKMeansModel(root, this.distanceMeasure)
   }
 
+  /**
+   * Runs the bisecting k-means algorithm.
+   * @param input RDD of vectors
+   * @return model for the bisecting kmeans
+   */
+  @Since("1.6.0")
--- End diff --

this api was already existing since 1.6.0, so we should keep the since 
annotation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-08-15 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21733
  
@tdas Kindly reminder.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread bomeng

Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22115
  
I have already done the global search. That is the only place needs change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-15 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22013
  
LGTM.
@mn-mikke @mgaido91 Do you have any other comments on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94830/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22031
  
**[Test build #94830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94830/testReport)**
 for PR 22031 at commit 
[`92cb34a`](https://github.com/apache/spark/commit/92cb34af9c1e5742d9fa21f677645daea029bfd6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ZipWith(left: Expression, right: Expression, function: 
Expression)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread KraFusion

Github user KraFusion commented on the issue:

https://github.com/apache/spark/pull/22116
  
@HyukjinKwon the same instance did exist in the spark website repo, PR has 
been merged.
Not sure what to change the title to, the PR instructions don't cover 
simple typo fixes in documentation that don't have an associated JIRA.  Should 
I prefix the current title with [DOCS] ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22116
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22116
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94836/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22116
  
**[Test build #94836 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94836/testReport)**
 for PR 22116 at commit 
[`2b2a61c`](https://github.com/apache/spark/commit/2b2a61c849ddad680819126f8a6fdc28cbbad721).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22115
  
Mind fixing Python / R / SQL ones while we are here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2235/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22116
  
**[Test build #94836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94836/testReport)**
 for PR 22116 at commit 
[`2b2a61c`](https://github.com/apache/spark/commit/2b2a61c849ddad680819126f8a6fdc28cbbad721).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22113#discussion_r210462023
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging {
   hdfsPath.getFileSystem(conf)
 }
 
-listOrcFiles(basePath, conf).iterator.map { path =>
+listOrcFiles(basePath, conf).view.map { path =>
--- End diff --

Do you mean `collectFirst` actually traverse `iterator` entirely?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94837/testReport)**
 for PR 22009 at commit 
[`0318b4b`](https://github.com/apache/spark/commit/0318b4b1dcbfde0024945308578cedf8d4a09168).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22113#discussion_r210461983
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging {
   hdfsPath.getFileSystem(conf)
 }
 
-listOrcFiles(basePath, conf).iterator.map { path =>
+listOrcFiles(basePath, conf).view.map { path =>
--- End diff --

Do you mean 'iterator' and `collectFirst` actually traverse entirely?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22109
  
**[Test build #4263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4263/testReport)**
 for PR 22109 at commit 
[`26ca9c2`](https://github.com/apache/spark/commit/26ca9c2c08c62961183e6461183c2963b6a00474).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22116
  
@KraFusion, mind double checking if there's same instance and fixing the PR 
title to reflect the change? Also should be good to read 
https://spark.apache.org/contributing.html even though it's a minor change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22116
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread deshanxiao

Github user deshanxiao commented on the issue:

https://github.com/apache/spark/pull/22109
  
@squito @vanzin Thanks, the first time I find it in our cluster is 
Spark2.1. Spark2.1 has the method `setupAndStartListenerBus`  too, but it still 
looks like wrong. The phenomenon of executor lose I find it in yesterday. Maybe 
we should fix them together.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21537
  
Yea, yea. I understand. I wasn't trying to say the severity of effect by 
introducing AnalysisBarrier wasn't trivial. True, I understand it is not an 
easy job. Thank you Reynold for that. Yea, also I don't mean to say we should 
just go ahead without sufficient discussion. Wanted to point out that there are 
positive aspects of the effort and try about AnalysisBarrier too. It wasn't all 
bad in a way.

> The reason why we did not merge this PR is that we are doubting this is 
the right thing to do. @rednaxelafx

If that's true, the concerns should be mentioned here and discussed. Was 
there a discussion about it in the community and did I miss it? I would 
appreciate if we can talk here.

> Instead of reinventing a compiler, how about letting the compiler 
internal expert (in our community, we have @kiszk) to lead the effort and offer 
a design for this. 

If there is a design concern for that and better suggestion, let's file a 
JIRA. I want to see the problem, concerns and possible suggestions as well.

Yup, I got that it might be better for someone who has some expertise in 
that area but I was thinking that they should purely based upon the community 
work primarily - in that way, it looked reasonable @viirya goes ahead since 
it's basically his work. If not, to me I don't see any particular one is 
preferred. 

Just wanted to point out that the baseline is open for anyone not for 
specific persons. If anyone is willing to do this, anyone is welcome to go 
ahead. So, primarily they should voluntarily join in without other factors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4264/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4266/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #94835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94835/testReport)**
 for PR 21860 at commit 
[`6ff46d9`](https://github.com/apache/spark/commit/6ff46d941a6ddb29345ea0c563aa68b77f540139).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22117
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2234/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22117
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22117: [SPARK-23654][BUILD] remove jets3t as a dependenc...

2018-08-15 Thread steveloughran

GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/22117

[SPARK-23654][BUILD] remove jets3t as a dependency of spark

# What changes were proposed in this pull request?

Remove jets3t dependency, and bouncy castle which it brings in; update 
licenses and deps

Note this is just #22081 with merge conflict resolved; submitting to see 
what jenkins says.

# How was this patch tested?

Existing tests on a JVM with unlimited Java Crypto Extensions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark incoming/PR-22081-jets3t

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22117


commit 3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764
Author: Sean Owen 
Date:   2018-08-11T21:41:38Z

Remove jets3t dependency, and bouncy castle which it brings in; update 
licenses and deps




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22117
  
**[Test build #94834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94834/testReport)**
 for PR 22117 at commit 
[`3cad78f`](https://github.com/apache/spark/commit/3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-15 Thread habren

Github user habren commented on the issue:

https://github.com/apache/spark/pull/21868
  
Hi @HyukjinKwon  I moved the change to master branch just now. Please help 
to review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-15 Thread habren

Github user habren commented on a diff in the pull request:

https://github.com/apache/spark/pull/21868#discussion_r210456342
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -401,12 +399,41 @@ case class FileSourceScanExec(
   fsRelation: HadoopFsRelation): RDD[InternalRow] = {
 val defaultMaxSplitBytes =
   fsRelation.sparkSession.sessionState.conf.filesMaxPartitionBytes
-val openCostInBytes = 
fsRelation.sparkSession.sessionState.conf.filesOpenCostInBytes
+var openCostInBytes = 
fsRelation.sparkSession.sessionState.conf.filesOpenCostInBytes
 val defaultParallelism = 
fsRelation.sparkSession.sparkContext.defaultParallelism
 val totalBytes = selectedPartitions.flatMap(_.files.map(_.getLen + 
openCostInBytes)).sum
 val bytesPerCore = totalBytes / defaultParallelism
 
-val maxSplitBytes = Math.min(defaultMaxSplitBytes, 
Math.max(openCostInBytes, bytesPerCore))
+var maxSplitBytes = Math.min(defaultMaxSplitBytes, 
Math.max(openCostInBytes, bytesPerCore))
+if(fsRelation.fileFormat.isInstanceOf[ParquetSource] &&
+  
fsRelation.sparkSession.sessionState.conf.isParquetSizeAdaptiveEnabled) {
+  if (relation.dataSchema.map(_.dataType).forall(dataType =>
+dataType.isInstanceOf[CalendarIntervalType] || 
dataType.isInstanceOf[StructType]
+  || dataType.isInstanceOf[MapType] || 
dataType.isInstanceOf[NullType]
+  || dataType.isInstanceOf[AtomicType] || 
dataType.isInstanceOf[ArrayType])) {
+
+def getTypeLength (dataType : DataType) : Int = {
+  if (dataType.isInstanceOf[StructType]) {
+
fsRelation.sparkSession.sessionState.conf.parquetStructTypeLength
+  } else if (dataType.isInstanceOf[ArrayType]) {
+
fsRelation.sparkSession.sessionState.conf.parquetArrayTypeLength
+  } else if (dataType.isInstanceOf[MapType]) {
+fsRelation.sparkSession.sessionState.conf.parquetMapTypeLength
+  } else {
+dataType.defaultSize
+  }
+}
+
+val selectedColumnSize = 
requiredSchema.map(_.dataType).map(getTypeLength(_))
+  .reduceOption(_ + _).getOrElse(StringType.defaultSize)
+val totalColumnSize = 
relation.dataSchema.map(_.dataType).map(getTypeLength(_))
+  .reduceOption(_ + _).getOrElse(StringType.defaultSize)
+val multiplier = totalColumnSize / selectedColumnSize
--- End diff --

@viirya  Now it also support ORC. Please help to review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

2018-08-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22110#discussion_r210455974
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala 
---
@@ -73,4 +73,14 @@ object TypeUtils {
 }
 x.length - y.length
   }
+
+  /**
+   * Returns true if elements of the data type could be used as items of a 
hash set or as keys
+   * of a hash map.
+   */
+  def typeCanBeHashed(dataType: DataType): Boolean = dataType match {
--- End diff --

hey, this is a weird name, `byte[]` can also be hashed. I'd rather call it 
`typeWithProperEquals`, and document it as @mgaido91 proposed. I don't think we 
need to consider `hashCode` here, it's a rule in java world that equals and 
hashCode should be defined in a coherent way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22116
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22116
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: Update configuration.md

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22116
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22116: Update configuration.md

2018-08-15 Thread KraFusion

GitHub user KraFusion opened a pull request:

https://github.com/apache/spark/pull/22116

Update configuration.md

changed $SPARK_HOME/conf/spark-default.conf to 
$SPARK_HOME/conf/spark-defaults.conf

no testing necessary as this was a change to documentation.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KraFusion/spark-1 patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22116


commit 2b2a61c849ddad680819126f8a6fdc28cbbad721
Author: Joey Krabacher 
Date:   2018-08-16T01:24:08Z

Update configuration.md

changed $SPARK_HOME/conf/spark-default.conf to 
$SPARK_HOME/conf/spark-defaults.conf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
>> Hello, we've been using your patch at Stripe and we've found something 
that looks like a new bug:
>
> Thank you for sharing this, @xinxin-stripe. This is very helpful. I will 
investigate and report back.

I have not been able to reproduce this issue with this branch at commit 
0e5594b6ac1dcb94f3f0166e66a7d4e7eae3d00c. However, I'm seeing the same failure 
scenario as yours on VideoAmp's internal 2.1, 2.2 and 2.3 backports of this 
branch. I think the reason for this difference is that our internal branches 
(and probably yours) incorporate rules to support pruning for aggregations. 
That functionality was removed from this PR.

I will fix this and share the fix with you. It would help if you could send 
me a scenario where you can reproduce this failure with a Spark SQL query. 
Query plans for datasets built from SQL queries tend to be much more readable.

Consider e-mailing me directly on this issue because it does not appear to 
be strictly related to this PR. My e-mail address is m...@allman.ms. Thanks 
again!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-15 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21123
  
@rdblue I really appreciate your work for adding these new logical plans, 
and they are indeed incremental: when you add `AppendData`, you change 
`DataFrameWriter` to only create `AppendData` if `SaveMode` is "append". For 
other modes, still use the old `WriteToDataSourceV2`.

That said, every PR we merged for data source v2, makes data source v2 
better and still usable. I don't want to change this policy in #22009.

I understand your concern of keeping the bad `SaveMode` API in data source 
v2, I hate it too. We should definitely revisit it before marking data source 
v2 as stable, but I don't think we need to rush to a decision in #22009 , which 
doesn't mark the v2 API stable. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21537
  
To Spark users, introducing AnalysisBarrier is a disaster. However, to the 
developers of Spark internal, this is just a bug. If you served the customers 
who are heavily using Spark, you will understand what I am talking about. It is 
even hard to debug when the Spark jobs are very complex. 

Normally, we never commit/merge any PR that is useless, especially when the 
PR changes are not tiny. Reverting this PRs are also very painful. That is why 
Reynold took a few days to finish it. It is not a fun job for him to rewrite it.

Based on the current work, I can expect there are hundreds of PRs that will 
be submitted for changing the codegen templates and polishing the current code. 
The reason why we did not merge this PR is that we are doubting this is the 
right thing to do. @rednaxelafx 

I am not saying @viirya and @mgaido91 did a bad job to submit many PRs to 
improve the existing one. However, we need to think of the fundamental problems 
we are solving in the codegen. Instead of reinventing a compiler, how about 
letting the compiler internal expert (in our community, we have @kiszk) to lead 
the effort and offer a design for this. Coding and designing are different 
issues. If possible, we need to find the best person to drive it. If @viirya 
and @mgaido91 think they are familiar with compiler internal, I am also glad to 
see the designs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/22031#discussion_r210452329
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,91 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(left, right, func) - Merges the two given arrays, 
element-wise, into a single array using function. If one array is shorter, 
nulls are appended at the end to match the length of the longer array, before 
applying function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, 
x));
+   array(('a', 1), ('b', 3), ('c', 5))
+  > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y));
+   array(4, 6)
+  > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) 
-> concat(x, y));
+   array('ad', 'be', 'cf')
+  """,
+  since = "2.4.0")
+// scalastyle:on line.size.limit
+case class ArraysZipWith(
+left: Expression,
+right: Expression,
+function: Expression)
+  extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes {
+
+  override def inputs: Seq[Expression] = List(left, right)
+
+  override def functions: Seq[Expression] = List(function)
+
+  def expectingFunctionType: AbstractDataType = AnyDataType
+  @transient lazy val functionForEval: Expression = functionsForEval.head
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
ArrayType, expectingFunctionType)
+
+  override def nullable: Boolean = inputs.exists(_.nullable)
+
+  override def dataType: ArrayType = ArrayType(function.dataType, 
function.nullable)
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): ArraysZipWith = {
+val (leftElementType, leftContainsNull) = left.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+val (rightElementType, rightContainsNull) = right.dataType match {
+  case ArrayType(elementType, containsNull) => (elementType, 
containsNull)
+  case _ =>
+val ArrayType(elementType, containsNull) = 
ArrayType.defaultConcreteType
+(elementType, containsNull)
+}
+copy(function = f(function,
+  (leftElementType, leftContainsNull) :: (rightElementType, 
rightContainsNull) :: Nil))
--- End diff --

@mn-mikke @ueshin "both arrays must be the same length" was how zip_with in 
presto used to work, they've moved to appending nulls and process regardless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22031
  
**[Test build #94833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94833/testReport)**
 for PR 22031 at commit 
[`0342ed9`](https://github.com/apache/spark/commit/0342ed934e65c13c43081f464503800118383a44).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2233/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22031
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22095: [SPARK-23984][K8S] Changed Python Version config ...

2018-08-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22095


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #94832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94832/testReport)**
 for PR 21860 at commit 
[`768c914`](https://github.com/apache/spark/commit/768c9147c82e3a160bbd6cb29f30da87549518de).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94832/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #94832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94832/testReport)**
 for PR 21860 at commit 
[`768c914`](https://github.com/apache/spark/commit/768c9147c82e3a160bbd6cb29f30da87549518de).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4276 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4276/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4272/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4271/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...

2018-08-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22106
  
**[Test build #4269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4269/testReport)**
 for PR 22106 at commit 
[`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22101: [SPARK-25114][Core] Fix RecordBinaryComparator when subt...

2018-08-15 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22101
  
ping @gatorsmile @mridulm @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 510 matches

Mail list logo