date:20180603

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-06-03 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/21369#discussion_r192631230
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala
 ---
@@ -414,7 +415,106 @@ class ExternalAppendOnlyMapSuite extends 
SparkFunSuite with LocalSparkContext {
 sc.stop()
   }
 
-  test("external aggregation updates peak execution memory") {
+  test("SPARK-22713 spill during iteration leaks internal map") {
+val size = 1000
+val conf = createSparkConf(loadDefaults = true)
+sc = new SparkContext("local-cluster[1,1,1024]", "test", conf)
+val map = createExternalMap[Int]
--- End diff --

@cloud-fan , can we move on with this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3789/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #91437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91437/testReport)**
 for PR 13599 at commit 
[`9f32a2f`](https://github.com/apache/spark/commit/9f32a2f3c1277652856e59997af83a2bbe91ce74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21156
  
**[Test build #91436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91436/testReport)**
 for PR 21156 at commit 
[`946688a`](https://github.com/apache/spark/commit/946688aee3d03d37a57270e654e00bb9236f21c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21156
  
**[Test build #91435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91435/testReport)**
 for PR 21156 at commit 
[`fa76a78`](https://github.com/apache/spark/commit/fa76a7823baf4e6eb05f33bc746ade7f65f44372).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15297
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192627641
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column", delimiter = "character"),
+ function(x, delimiter, nullReplacement = NULL) {
+   jc <- if (is.null(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   } else {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter,
+ nullReplacement)
--- End diff --

`as.character(nullReplacement)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192627948
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter a character string that is used to concatenate the 
elements of column.
+#' @param nullReplacement a character string that is used to replace the 
Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column", delimiter = "character"),
+ function(x, delimiter, nullReplacement = NULL) {
+   jc <- if (is.null(nullReplacement)) {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter)
+   } else {
+ callJStatic("org.apache.spark.sql.functions", "array_join", 
x@jc, delimiter,
+ nullReplacement)
--- End diff --

re https://github.com/apache/spark/pull/21313#discussion_r192578750
 so what's the behavior if delimiter is more than one character?
like `array_join(df$a, "#Foo", "Beautiful")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20894


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20894
  
LGTM

Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21487: [SPARK-24369][SQL] Correct handling for multiple ...

2018-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21487


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21487
  
Thanks! Merged to master and 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21156
  
**[Test build #91434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91434/testReport)**
 for PR 21156 at commit 
[`4e026e5`](https://github.com/apache/spark/commit/4e026e5e437dc7f578434244b55bb1ebe189bace).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread lirui-apache

Github user lirui-apache commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192614888
  
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -197,14 +197,14 @@ private[spark] class HeartbeatReceiver(sc: 
SparkContext, clock: Clock)
   if (now - lastSeenMs > executorTimeoutMs) {
 logWarning(s"Removing executor $executorId with no recent 
heartbeats: " +
   s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " 
+
-  s"timed out after ${now - lastSeenMs} ms"))
   // Asynchronously kill the executor to avoid blocking the 
current thread
 killExecutorThread.submit(new Runnable {
   override def run(): Unit = Utils.tryLogNonFatalError {
 // Note: we want to get an executor back after expiring this 
one,
 // so do not simply call `sc.killExecutor` here (SPARK-8119)
 sc.killAndReplaceExecutor(executorId)
--- End diff --

Yes:
```
  private[spark] def killAndReplaceExecutor(executorId: String): Boolean = {
schedulerBackend match {
  case b: ExecutorAllocationClient =>
b.killExecutors(Seq(executorId), adjustTargetNumExecutors = false, 
countFailures = true,
  force = true).nonEmpty
  case _ =>
logWarning("Killing executors is not supported by current 
scheduler.")
false
}
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread lirui-apache

Github user lirui-apache commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192614874
  
--- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 
---
@@ -207,6 +210,55 @@ class HeartbeatReceiverSuite
 assert(fakeClusterManager.getExecutorIdsToKill === Set(executorId1, 
executorId2))
   }
 
+  test("expired host should not be offered again") {
+scheduler = spy(new TaskSchedulerImpl(sc))
+scheduler.setDAGScheduler(sc.dagScheduler)
+when(sc.taskScheduler).thenReturn(scheduler)
+doReturn(true).when(scheduler).executorHeartbeatReceived(any(), any(), 
any())
+
+// Set up a fake backend and cluster manager to simulate killing 
executors
+val rpcEnv = sc.env.rpcEnv
+val fakeClusterManager = new FakeClusterManager(rpcEnv)
+val fakeClusterManagerRef = rpcEnv.setupEndpoint("fake-cm", 
fakeClusterManager)
+val fakeSchedulerBackend = new FakeSchedulerBackend(scheduler, rpcEnv, 
fakeClusterManagerRef)
+when(sc.schedulerBackend).thenReturn(fakeSchedulerBackend)
+
+fakeSchedulerBackend.start()
+val dummyExecutorEndpoint1 = new FakeExecutorEndpoint(rpcEnv)
+val dummyExecutorEndpointRef1 = 
rpcEnv.setupEndpoint("fake-executor-1", dummyExecutorEndpoint1)
+fakeSchedulerBackend.driverEndpoint.askSync[Boolean](
+  RegisterExecutor(executorId1, dummyExecutorEndpointRef1, "1.2.3.4", 
2, Map.empty))
+heartbeatReceiverRef.askSync[Boolean](TaskSchedulerIsSet)
+addExecutorAndVerify(executorId1)
+triggerHeartbeat(executorId1, executorShouldReregister = false)
+
+scheduler.initialize(fakeSchedulerBackend)
+sc.requestTotalExecutors(0, 0, Map.empty)
--- End diff --

Just thought we don't need a replacement executor for this test case. Will 
update total num to 1 to be closer to real use case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread lirui-apache

Github user lirui-apache commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192614878
  
--- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 
---
@@ -207,6 +210,55 @@ class HeartbeatReceiverSuite
 assert(fakeClusterManager.getExecutorIdsToKill === Set(executorId1, 
executorId2))
   }
 
+  test("expired host should not be offered again") {
--- End diff --

Should be `executor`. Thanks for catching, will update


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21467
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91433/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21467
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21467
  
**[Test build #91433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91433/testReport)**
 for PR 21467 at commit 
[`8505de2`](https://github.com/apache/spark/commit/8505de28231e99b63371d0798545b693692cbce4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21467
  
**[Test build #91433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91433/testReport)**
 for PR 21467 at commit 
[`8505de2`](https://github.com/apache/spark/commit/8505de28231e99b63371d0798545b693692cbce4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21467
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610559
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
--- End diff --

ditto:

```
"%s\n" % "".join(map(lambda x: cgi.escape(x), 
row))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610390
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
--- End diff --

tiny nit: `row_data[:max_num_rows]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610512
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
--- End diff --

maybe:

```
"%s\n" % "".join(map(lambda x: cgi.escape(x), 
head))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610308
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
--- End diff --

`css` seems not used.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610839
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,36 @@ def test_csv_sampling_ratio(self):
 .csv(rdd, samplingRatio=0.5).schema
 self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
 
+def test_repr_html(self):
+import re
+pattern = re.compile(r'^ *\|', re.MULTILINE)
+df = self.spark.createDataFrame([(1, "1"), (2, "2")], 
("key", "value"))
+self.assertEquals(None, df._repr_html_())
+self.spark.conf.set("spark.sql.repl.eagerEval.enabled", "true")
--- End diff --

Can we use `with self.sql_conf(...)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610620
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
+html += data
+html += "\n"
+if has_more_data:
+html += "only showing top %d %s\n" % (
--- End diff --

I'd just way `row(s)`. Don't have to be super clever on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl...

2018-06-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21485


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21485
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21487
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3788/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91432/testReport)**
 for PR 21488 at commit 
[`062c6d0`](https://github.com/apache/spark/commit/062c6d00d9a7625a97d85f677ea03cc695d9c0dc).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91432/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91432/testReport)**
 for PR 21488 at commit 
[`062c6d0`](https://github.com/apache/spark/commit/062c6d00d9a7625a97d85f677ea03cc695d9c0dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21313
  
**[Test build #91430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91430/testReport)**
 for PR 21313 at commit 
[`901ff32`](https://github.com/apache/spark/commit/901ff32a03c6ec0c16a0ff7c625781ccf2355a54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91430/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3787/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91431/testReport)**
 for PR 21488 at commit 
[`f76da89`](https://github.com/apache/spark/commit/f76da891e6eb004f92b0c423ba371ec971da7584).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91431/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91431/testReport)**
 for PR 21488 at commit 
[`f76da89`](https://github.com/apache/spark/commit/f76da891e6eb004f92b0c423ba371ec971da7584).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3786/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21485
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91428/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21485
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21313
  
**[Test build #91430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91430/testReport)**
 for PR 21313 at commit 
[`901ff32`](https://github.com/apache/spark/commit/901ff32a03c6ec0c16a0ff7c625781ccf2355a54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21485
  
**[Test build #91428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91428/testReport)**
 for PR 21485 at commit 
[`9ca9e4b`](https://github.com/apache/spark/commit/9ca9e4bf55c94ed1a2281b0beb5b1e4df648f48f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21488: SPARK-18057 Update structured streaming kafka fro...

2018-06-03 Thread ijuma

Github user ijuma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21488#discussion_r192602632
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -96,10 +101,13 @@ class KafkaTestUtils(withBrokerProps: Map[String, 
Object] = Map.empty) extends L
   // Set up the Embedded Zookeeper server and get the proper Zookeeper port
   private def setupEmbeddedZookeeper(): Unit = {
 // Zookeeper server startup
-zookeeper = new EmbeddedZookeeper(s"$zkHost:$zkPort")
+val zkSvr = s"$zkHost:$zkPort";
+zookeeper = new EmbeddedZookeeper(zkSvr)
 // Get the actual zookeeper binding port
 zkPort = zookeeper.actualPort
-zkUtils = ZkUtils(s"$zkHost:$zkPort", zkSessionTimeout, 
zkConnectionTimeout, false)
+zkUtils = ZkUtils(zkSvr, zkSessionTimeout, zkConnectionTimeout, false)
+zkClient = KafkaZkClient(zkSvr, false, 6000, 1, Int.MaxValue, 
Time.SYSTEM)
+adminZkClient = new AdminZkClient(zkClient)
--- End diff --

AdminClient.create gives you a concrete instance. createPartitions is the 
method you're looking for.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21488: SPARK-18057 Update structured streaming kafka fro...

2018-06-03 Thread tedyu

Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21488#discussion_r192601997
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -96,10 +101,13 @@ class KafkaTestUtils(withBrokerProps: Map[String, 
Object] = Map.empty) extends L
   // Set up the Embedded Zookeeper server and get the proper Zookeeper port
   private def setupEmbeddedZookeeper(): Unit = {
 // Zookeeper server startup
-zookeeper = new EmbeddedZookeeper(s"$zkHost:$zkPort")
+val zkSvr = s"$zkHost:$zkPort";
+zookeeper = new EmbeddedZookeeper(zkSvr)
 // Get the actual zookeeper binding port
 zkPort = zookeeper.actualPort
-zkUtils = ZkUtils(s"$zkHost:$zkPort", zkSessionTimeout, 
zkConnectionTimeout, false)
+zkUtils = ZkUtils(zkSvr, zkSessionTimeout, zkConnectionTimeout, false)
+zkClient = KafkaZkClient(zkSvr, false, 6000, 1, Int.MaxValue, 
Time.SYSTEM)
+adminZkClient = new AdminZkClient(zkClient)
--- End diff --

AdminClient is abstract.
KafkaAdminClient doesn't provide addPartitions.

Mind giving some pointer ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21488: SPARK-18057 Update structured streaming kafka fro...

2018-06-03 Thread ijuma

Github user ijuma commented on a diff in the pull request:

https://github.com/apache/spark/pull/21488#discussion_r192601219
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -96,10 +101,13 @@ class KafkaTestUtils(withBrokerProps: Map[String, 
Object] = Map.empty) extends L
   // Set up the Embedded Zookeeper server and get the proper Zookeeper port
   private def setupEmbeddedZookeeper(): Unit = {
 // Zookeeper server startup
-zookeeper = new EmbeddedZookeeper(s"$zkHost:$zkPort")
+val zkSvr = s"$zkHost:$zkPort";
+zookeeper = new EmbeddedZookeeper(zkSvr)
 // Get the actual zookeeper binding port
 zkPort = zookeeper.actualPort
-zkUtils = ZkUtils(s"$zkHost:$zkPort", zkSessionTimeout, 
zkConnectionTimeout, false)
+zkUtils = ZkUtils(zkSvr, zkSessionTimeout, zkConnectionTimeout, false)
+zkClient = KafkaZkClient(zkSvr, false, 6000, 1, Int.MaxValue, 
Time.SYSTEM)
+adminZkClient = new AdminZkClient(zkClient)
--- End diff --

Can we use the Java AdminClient instead of these internal classes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3785/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91429/testReport)**
 for PR 21488 at commit 
[`0a22686`](https://github.com/apache/spark/commit/0a22686d9a388a21d5dd38513854341d3f37f738).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21488
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91429/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21488
  
**[Test build #91429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91429/testReport)**
 for PR 21488 at commit 
[`0a22686`](https://github.com/apache/spark/commit/0a22686d9a388a21d5dd38513854341d3f37f738).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21488: SPARK-18057 Update structured streaming kafka fro...

2018-06-03 Thread tedyu

GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/21488

SPARK-18057 Update structured streaming kafka from 0.10.0.1 to 2.0.0

## What changes were proposed in this pull request?

This PR upgrades to the Kafka 2.0.0 release where KIP-266 is integrated.

## How was this patch tested?

This PR uses existing Kafka related unit tests

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21488


commit 0a22686d9a388a21d5dd38513854341d3f37f738
Author: tedyu 
Date:   2018-06-03T19:54:22Z

SPARK-18057 Update structured streaming kafka from 0.10.0.1 to 2.0.0




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

2018-06-03 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21398
  
@vanzin should we merge this or add a comment in the docs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21485
  
**[Test build #91428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91428/testReport)**
 for PR 21485 at commit 
[`9ca9e4b`](https://github.com/apache/spark/commit/9ca9e4bf55c94ed1a2281b0beb5b1e4df648f48f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21485
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...

2018-06-03 Thread xueyumusic

Github user xueyumusic commented on the issue:

https://github.com/apache/spark/pull/21485
  
I took another look and find some typos, please review them, @HyukjinKwon , 
thank you for reminding


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192592142
  
--- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 
---
@@ -207,6 +210,55 @@ class HeartbeatReceiverSuite
 assert(fakeClusterManager.getExecutorIdsToKill === Set(executorId1, 
executorId2))
   }
 
+  test("expired host should not be offered again") {
+scheduler = spy(new TaskSchedulerImpl(sc))
+scheduler.setDAGScheduler(sc.dagScheduler)
+when(sc.taskScheduler).thenReturn(scheduler)
+doReturn(true).when(scheduler).executorHeartbeatReceived(any(), any(), 
any())
+
+// Set up a fake backend and cluster manager to simulate killing 
executors
+val rpcEnv = sc.env.rpcEnv
+val fakeClusterManager = new FakeClusterManager(rpcEnv)
+val fakeClusterManagerRef = rpcEnv.setupEndpoint("fake-cm", 
fakeClusterManager)
+val fakeSchedulerBackend = new FakeSchedulerBackend(scheduler, rpcEnv, 
fakeClusterManagerRef)
+when(sc.schedulerBackend).thenReturn(fakeSchedulerBackend)
+
+fakeSchedulerBackend.start()
+val dummyExecutorEndpoint1 = new FakeExecutorEndpoint(rpcEnv)
+val dummyExecutorEndpointRef1 = 
rpcEnv.setupEndpoint("fake-executor-1", dummyExecutorEndpoint1)
+fakeSchedulerBackend.driverEndpoint.askSync[Boolean](
+  RegisterExecutor(executorId1, dummyExecutorEndpointRef1, "1.2.3.4", 
2, Map.empty))
+heartbeatReceiverRef.askSync[Boolean](TaskSchedulerIsSet)
+addExecutorAndVerify(executorId1)
+triggerHeartbeat(executorId1, executorShouldReregister = false)
+
+scheduler.initialize(fakeSchedulerBackend)
+sc.requestTotalExecutors(0, 0, Map.empty)
--- End diff --

why request 0 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192592170
  
--- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 
---
@@ -207,6 +210,55 @@ class HeartbeatReceiverSuite
 assert(fakeClusterManager.getExecutorIdsToKill === Set(executorId1, 
executorId2))
   }
 
+  test("expired host should not be offered again") {
--- End diff --

Also, better to attach JIRA number.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-03 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r192591845
  
--- Diff: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala 
---
@@ -207,6 +210,55 @@ class HeartbeatReceiverSuite
 assert(fakeClusterManager.getExecutorIdsToKill === Set(executorId1, 
executorId2))
   }
 
+  test("expired host should not be offered again") {
--- End diff --

`host` or `executor` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21449
  
This will definitely not go into 2.3.1, so we have plenty of time. I'll 
think deeper into it after the spark summit.

IMO `df.join(df, df("id") >= df("id"))` is ambiguous, especially when it's 
not an inner join.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91427/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21487
  
**[Test build #91427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91427/testReport)**
 for PR 21487 at commit 
[`8386b42`](https://github.com/apache/spark/commit/8386b4250d90eb369c85f02de7bbabe7a2ebbdaa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-06-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21231
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21487
  
LGTM (I checked the related tests passed in my local).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192584018
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -747,8 +748,13 @@ case class StructsToJson(
 
 object JsonExprUtils {
 
-  def validateSchemaLiteral(exp: Expression): StructType = exp match {
-case Literal(s, StringType) => 
CatalystSqlParser.parseTableSchema(s.toString)
+  def validateSchemaLiteral(exp: Expression): DataType = exp match {
+case Literal(s, StringType) =>
+  try {
+DataType.fromJson(s.toString)
--- End diff --

If possible, I like @HyukjinKwon 's approach. I remember correctly we just 
keep json schema formats for back-compatibility. In future major releases, I 
think we possibly drop the support.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21481
  
Good questions.

For 2, at first I found one of these issues when I looked at a file. Then, 
I ran `grep` command with `long .*=.*\*` and `long .*=.*\+` in `.java` file. 
Then, I picked them up manually. It looks labor-intensive.

For 3, here is my thought.
[`SpotBugs`](https://spotbugs.github.io/) may be a good candidate to check 
it.
SpotBug is a successor of  [`findBugs`](https://findbugs.sourceforge.net/). 
When I ran `FindBugs` before, I found some problems regarding possible overflow 
and then made a PR that was integrated. On the other hand, these issues may not 
be detected at that time.

I will look at SpotBugs after my presentation at SAIS will be finished :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21472: [SPARK-24445][SQL] Schema in json format for from...

2018-06-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21472#discussion_r192583925
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/json-functions.sql 
---
@@ -25,6 +25,10 @@ select from_json('{"a":1}', 'a InvalidType');
 select from_json('{"a":1}', 'a INT', named_struct('mode', 'PERMISSIVE'));
 select from_json('{"a":1}', 'a INT', map('mode', 1));
 select from_json();
+-- from_json - schema in json format
+select from_json('{"a":1}', 
'{"type":"struct","fields":[{"name":"a","type":"integer", "nullable":true}]}');
+select from_json('{"a":1}', '{"type":"map", "keyType":"string", 
"valueType":"integer","valueContainsNull":false}');
+
--- End diff --

To make the output file changes smaller, can you add new tests in the end 
of file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-06-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21439
  
> What kind of tests would you expect in json-functions.sql. Probably you 
would expect tests
> that are different from added to JsonExpressionsSuite.scala.
IIUC there is no strict rule there and we tend to put SQL related tests in 
`SQLQueryTestSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-03 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
I see what you mean. Honestly I have not thought of a full design for this 
problem (so I can't state what we should support and what not), but focusing on 
this specific case I think that:

 - at the moment we do support self-joins (at least in the case 
`df.join(df, df("id") >= df("id"))`) so considering this invalid would cause a 
big behavior change (potentially causing user workflows to break).
 - even though we might consider acceptable such a change in a major 
release, I think that we should support with the Dataframe API what we support 
in the SQL API, and SQL standard supports self joins (using aliases for the 
tables). So I do believe we should support this use case.
 - the case presented by @daniel-shields in 
https://github.com/apache/spark/pull/21449#issuecomment-392947474, I think is a 
valid one without any doubt. As of now we are not supporting it, though.

So I think that in the holistic approach we shouldn't change the current 
behavior/approach which is present now and will be (IMHO) improved by this 
patch.

What I do think we have to discuss in order not to have to change it - once 
we want to solve the more generic issue - is the way to track the dataset an 
attribute is coming from. Here I decided to use the metadata, since I thought 
this is the cleanest approach. Another approach might be to introduce a new 
`Option` in the `AttributeReference` a reference to the dataset it is coming 
from.
For the generic solution, this might have the advantage that having a 
reference to the provenance dataset, where we might want to store some kind of 
DAG of the datasets this one is coming from in order to take more complex 
decision about the validity of the syntax and/or the resolution of the 
attribute.

What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3784/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21487
  
**[Test build #91427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91427/testReport)**
 for PR 21487 at commit 
[`8386b42`](https://github.com/apache/spark/commit/8386b4250d90eb369c85f02de7bbabe7a2ebbdaa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21487
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21487
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91425/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21487: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21487
  
**[Test build #91425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91425/testReport)**
 for PR 21487 at commit 
[`8386b42`](https://github.com/apache/spark/commit/8386b4250d90eb369c85f02de7bbabe7a2ebbdaa).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578796
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
--- End diff --

@felixcheung 
I will add the type so it will be
```
signature(x = "Column", delimiter = "character"),
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578774
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
+#' @param nullReplacement character(s) to use to replace the Null values.
+#' @rdname column_collection_functions
+#' @aliases array_join array_join,Column-method
+#' @note array_join since 2.4.0
+setMethod("array_join",
+ signature(x = "Column"),
+ function(x, delimiter, nullReplacement = NA) {
--- End diff --

@felixcheung 
I will change the default to NULL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21313: [SPARK-24187][R][SQL]Add array_join function to S...

2018-06-03 Thread huaxingao

Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21313#discussion_r192578750
  
--- Diff: R/pkg/R/functions.R ---
@@ -3006,6 +3008,27 @@ setMethod("array_contains",
 column(jc)
   })
 
+#' @details
+#' \code{array_join}: Concatenates the elements of column using the 
delimiter.
+#' Null values are replaced with nullReplacement if set, otherwise they 
are ignored.
+#'
+#' @param delimiter character(s) to use to concatenate the elements of 
column.
--- End diff --

@felixcheung scala doesn't have a doc for param delimiter. I added this 
myself. What I am trying to say is "one or more characters". I will change to 
"a character string" so it will be
```
@param delimiter a character string to use to concatenate the elements of 
column.
```
Does this look ok to you?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91426/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19602
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

2018-06-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19602
  
**[Test build #91426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91426/testReport)**
 for PR 19602 at commit 
[`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

93 matches

Mail list logo