date:20180827

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/22236
  
just FYI about another related PR: 
https://github.com/apache/spark/pull/17280 
and maybe I should close it? @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r213187429
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -91,6 +91,13 @@ private[spark] class Client(
   private val executorMemoryOverhead = 
sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
 math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, 
MEMORY_OVERHEAD_MIN)).toInt
 
+  private val isPython = sparkConf.get(IS_PYTHON_APP)
--- End diff --

Sure, one of them is https://github.com/sparklingpandas/sparklingml


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r213186832
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala
 ---
@@ -161,6 +162,11 @@ abstract class BaseYarnClusterSuite
 }
 extraJars.foreach(launcher.addJar)
 
+if (outFile.isDefined) {
--- End diff --

I think the pattern match would be better than the get.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22149
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95323/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22149
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22149
  
**[Test build #95323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95323/testReport)**
 for PR 22149 at commit 
[`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22247#discussion_r213181568
  
--- Diff: python/pyspark/worker.py ---
@@ -364,8 +364,5 @@ def process():
 # Read information about how to connect back to the JVM from the 
environment.
 java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"])
 auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"]
-sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-sock.connect(("127.0.0.1", java_port))
-sock_file = sock.makefile("rwb", 65536)
--- End diff --

I quickly tested and seems working fine. Please ignore this comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22238
  
**[Test build #95326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95326/testReport)**
 for PR 22238 at commit 
[`e2ee43d`](https://github.com/apache/spark/commit/e2ee43da2f9bf4fb95c938764ee3584bbae06c1b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213181165
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Notes**
+
+- There're couple of configurations which are not modifiable once you run 
the query. If you really want to make changes for these configurations, you 
have to discard checkpoint and start a new query.
+  - `spark.sql.shuffle.partitions`
+- This is due to the physical partitioning of state: state is 
partitioned via applying hash function to key, hence the number of partitions 
for state should be unchanged.
+- If you want to run less tasks for stateful operations, `coalesce` 
would help with avoiding unnecessary repartitioning.
+  - e.g. `df.groupBy("time").count().coalesce(10)` reduces the number 
of tasks by 10, whereas `spark.sql.shuffle.partitions` may be bigger.
+  - After `coalesce`, the number of (reduced) tasks will be kept 
unless another shuffle happens.
+  - `spark.sql.streaming.stateStore.providerClass`
--- End diff --

Ah, okay, so there are more instances to describe here. If so, im okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213181040
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Notes**
--- End diff --

I was thinking adding this information somewhere API or configuration only. 
For instance, notes like https://github.com/apache/spark/pull/19617. 

> lots of wondering around SO and user mailing list,

I don't object to note these stuff but usually the site has only key points 
for some features or configurations. 

If there are more instance to describe specifically for structured 
streaming (where the same SQL configurations could lead to some confusions), I 
am fine with adding this. If not or less sure for now, I would add them into 
API's doc or configuration's doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213180977
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
-**Further Reading**
+## Configuration Options For Structured Streaming
+
+This section is for configurations which are only available for structured 
streaming, or they behave differently with batch query.
+
+- spark.sql.shuffle.partitions
--- End diff --

I was thinking adding this information somewhere API or configuration only. 
For instance, notes like https://github.com/apache/spark/pull/19617. 

> lots of wondering around SO and user mailing list,

I don't object to note these stuff but usually the site has only key points 
for some features or configurations. 

If there are more instance to describe specifically for structured 
streaming (where the same SQL configurations could lead to some confusions), I 
am fine with adding this. If not or less sure for now, I would add them into 
API's doc or configuration's doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95322/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22198
  
**[Test build #95322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)**
 for PR 22198 at commit 
[`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213179259
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
-**Further Reading**
+## Configuration Options For Structured Streaming
+
+This section is for configurations which are only available for structured 
streaming, or they behave differently with batch query.
+
+- spark.sql.shuffle.partitions
--- End diff --

I can revert adding a new section if you meant adding `##` on it. While 
gotcha looks more like funny, I will change it to `**Notes**`.

The rationalization on adding this to doc is, this restriction had been 
making lots of wondering around SO and user mailing list, as well as even a 
patch for fixing this. So all of end users who use structured streaming would 
be nice to see it at least once, even they skim the doc, so that they can 
remember and revisit the doc once they get stuck on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22243: [MINOR] Avoid code duplication for nullable in Hi...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22243#discussion_r213178417
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -155,6 +155,8 @@ trait HigherOrderFunction extends Expression with 
ExpectsInputTypes {
  */
 trait SimpleHigherOrderFunction extends HigherOrderFunction  {
 
+  override def nullable: Boolean = argument.nullable
--- End diff --

Yea, let's go ahead then if the change is small, straightforward and more 
deduplciation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22210
  
Seems okay to me too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22238
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95321/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213178020
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
-**Further Reading**
+## Configuration Options For Structured Streaming
+
+This section is for configurations which are only available for structured 
streaming, or they behave differently with batch query.
+
+- spark.sql.shuffle.partitions
--- End diff --

What I am worried is about adding a new section, which is quite unusual. 
Usually we go for it when multiple instances are detected later.

Are there more instance to describe here specifically for structured 
streaming?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22238
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22238
  
**[Test build #95321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)**
 for PR 22238 at commit 
[`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r213177782
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -613,8 +613,7 @@ case class JsonToStructs(
 }
 
 /**
- * Converts a [[StructType]], [[ArrayType]] of [[StructType]]s, [[MapType]]
- * or [[ArrayType]] of [[MapType]]s to a json output string.
+ * Converts a [[StructType]], [[ArrayType]] or [[MapType]] to a json 
output string.
--- End diff --

not a big deal but `JSON` while we are here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/6
  
Seems okay but I or someone else should take a closer look before getting 
this in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r213177790
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2289,12 +2289,10 @@ def from_json(col, schema, options={}):
 @since(2.1)
 def to_json(col, options={}):
 """
-Converts a column containing a :class:`StructType`, :class:`ArrayType` 
of
-:class:`StructType`\\s, a :class:`MapType` or :class:`ArrayType` of 
:class:`MapType`\\s
+Converts a column containing a :class:`StructType`, :class:`ArrayType` 
or a :class:`MapType`
 into a JSON string. Throws an exception, in the case of an unsupported 
type.
 
-:param col: name of column containing the struct, array of the 
structs, the map or
-array of the maps.
+:param col: name of column containing a struct, an array or a map.
 :param options: options to control converting. accepts the same 
options as the json datasource
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213177623
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
-**Further Reading**
+## Configuration Options For Structured Streaming
+
+This section is for configurations which are only available for structured 
streaming, or they behave differently with batch query.
+
+- spark.sql.shuffle.partitions
--- End diff --

IMHO, if something goes wrong with structured streaming, end users would 
try to review structured streaming guide doc, rather than sql programming guide 
doc. Could we wait for hearing more voices on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/7
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95311/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21976: [SPARK-24909][core] Always unregister pending par...

2018-08-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21976#discussion_r213176636
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2474,19 +2478,21 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 runEvent(makeCompletionEvent(
   taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2)))
 
-// There should be no new attempt of stage submitted,
-// because task(stageId=1, stageAttempt=1, partitionId=1) is still 
running in
-// the current attempt (and hasn't completed successfully in any 
earlier attempts).
-assert(taskSets.size === 4)
+// At this point there should be no active task set for stageId=1 and 
we need
+// to resubmit because the output from (stageId=1, stageAttemptId=0, 
partitionId=1)
+// was ignored due to executor failure
+assert(taskSets.size === 5)
+assert(taskSets(4).stageId === 1 && taskSets(4).stageAttemptId === 2
+  && taskSets(4).tasks.size === 1)
 
-// Complete task(stageId=1, stageAttempt=1, partitionId=1) 
successfully.
+// Complete task(stageId=1, stageAttempt=2, partitionId=1) 
successfully.
 runEvent(makeCompletionEvent(
-  taskSets(3).tasks(1), Success, makeMapStatus("hostB", 2)))
+  taskSets(4).tasks(0), Success, makeMapStatus("hostB", 2)))
--- End diff --

Yea thanks for explanation, BTW what's the jira number of the ongoing 
scheduler integration test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22249#discussion_r213176616
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
 StringTrimLeft(e.expr, Literal(trimString))
   }
 
+  /**
+* Extracts a part from a URL.
+*
+* @group string_funcs
+* @since 2.4.0
+*/
+  def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --

Can't we just use `expr` instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #95311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95311/testReport)**
 for PR 7 at commit 
[`4e10733`](https://github.com/apache/spark/commit/4e107337a47ce590c703b757b0a44d60d6b862e1).
 * This patch **fails from timeout after a configured wait of \`400m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213176423
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -232,30 +232,49 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
  * Splits str around pat (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`." +
+"The `limit` parameter controls the number of times the pattern is 
applied. If the limit " +
+"n is greater than zero then the pattern will be applied at most n - 1 
times, " +
+"the array's length will be no greater than n, and the array's last 
entry " +
+"will contain all input beyond the last matched delimiter. If n is " +
+"less than 0, then the pattern will be applied as many times as " +
+"possible and the array can have any length. If n is zero then the " +
+"pattern will be applied as many times as possible, the array can " +
+"have any length, and trailing empty strings will be discarded.",
--- End diff --

+1 for https://github.com/apache/spark/pull/7#discussion_r212815685. 
The doc should better be concise.

Can we just move those `limit` specific description into the arguments at 
`limit - a..`? This looks a bit messy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213176029
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
-**Further Reading**
+## Configuration Options For Structured Streaming
+
+This section is for configurations which are only available for structured 
streaming, or they behave differently with batch query.
+
+- spark.sql.shuffle.partitions
--- End diff --

We do have it in `sql-programming-guide.md`. Shall we add some info there 
for now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22247#discussion_r213175296
  
--- Diff: python/pyspark/worker.py ---
@@ -364,8 +364,5 @@ def process():
 # Read information about how to connect back to the JVM from the 
environment.
 java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"])
 auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"]
-sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-sock.connect(("127.0.0.1", java_port))
-sock_file = sock.makefile("rwb", 65536)
--- End diff --

@vanzin, BTW, did you test this on Windows too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22247#discussion_r213174542
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
 raise Exception("Unexpected reply from iterator server.")
 
 
+def local_connect_and_auth(sock_info):
+"""
+Connect to local host, authenticate with it, and return a 
(sockfile,sock) for that connection.
+Handles IPV4 & IPV6, does some error handling.
+:param sock_info: a tuple of (port, auth_secret) for connecting
+:return: a tuple with (sockfile, sock)
+"""
+port, auth_secret = sock_info
+sock = None
+errors = []
+# Support for both IPv4 and IPv6.
+# On most of IPv6-ready systems, IPv6 will take precedence.
+for res in socket.getaddrinfo("127.0.0.1", port, socket.AF_UNSPEC, 
socket.SOCK_STREAM):
+af, socktype, proto, canonname, sa = res
--- End diff --

nit: `af, socktype, proto, canonname, sa = res` -> `af, socktype, proto, _, 
sa = res`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22247#discussion_r213173845
  
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
 raise Exception("Unexpected reply from iterator server.")
 
 
+def local_connect_and_auth(sock_info):
--- End diff --

@squito, not a big deal but how about `local_connect_and_auth (port, 
auth_secret)` and ..

```python
(sockfile, sock) = local_connect_and_auth(port, auth_secret)
```

```python
(sock_file, _) = local_connect_and_auth(java_port, auth_secret)
```

```python
port, auth_secret = sock_info
(sockfile, sock) = local_connect_and_auth(port, auth_secret)
```

or 

```python
(sockfile, sock) = local_connect_and_auth(*sock_info)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread HeartSaVioR

Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22238
  
Also adding @tdas @zsxwing @jose-torres to cc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22246
  
> @viirya The reflection trick we use in 
scala.reflect.internal.util.ScalaClassLoader doesn't work when the REPL is 
called from test. Do you have any idea about it? Thanks.

Yeah, it seems due to classloader. After changed to Spark classloader, the 
tests were passed locally. Let's see if Jenkins tests pass too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95312/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #95325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95325/testReport)**
 for PR 21546 at commit 
[`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2599/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22104
  
**[Test build #95312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95312/testReport)**
 for PR 22104 at commit 
[`3f0a97a`](https://github.com/apache/spark/commit/3f0a97a89b39d2ad57c587e49bb07203a670faba).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21546
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22104


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22104
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...

2018-08-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22244
  
closing in favor of https://github.com/apache/spark/pull/22104


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract pyth...

2018-08-27 Thread cloud-fan

Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/22244


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22104
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95318/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17280
  
**[Test build #95318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95318/testReport)**
 for PR 17280 at commit 
[`733c7ff`](https://github.com/apache/spark/commit/733c7ff70c46f0c54cdf520b44645544b810e04e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2598/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22246
  
**[Test build #95324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95324/testReport)**
 for PR 22246 at commit 
[`e0d424d`](https://github.com/apache/spark/commit/e0d424d645010108a497c057fa4ad1e198f1e3d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95314/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22236
  
**[Test build #95314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95314/testReport)**
 for PR 22236 at commit 
[`88eb571`](https://github.com/apache/spark/commit/88eb571b732d42138b029ead106f4c8718e1e220).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95317/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22104
  
**[Test build #95317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95317/testReport)**
 for PR 22104 at commit 
[`2325a4f`](https://github.com/apache/spark/commit/2325a4f18a2bc6cc95d96bc5ac6790749b3e927e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22164: [SPARK-23679][YARN] Setting RM_HA_URLS for AmIpFi...

2018-08-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22164#discussion_r213168025
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala
 ---
@@ -126,4 +136,21 @@ private[spark] class YarnRMClient extends Logging {
 }
   }
 
+  private def getUrlByRmId(conf: Configuration, rmId: String): String = {
--- End diff --

For the Spark usage, I think it may not be so useful to use 
`AmFilterInitializer`, because we need to pass the filter parameters to driver 
either from RPC (client mode) or from configuration (cluster mode), in either 
way we should know how to set each parameter, so from my understanding using 
`AmFilterInitializer` seems not so useful. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22149
  
```
Is that possible to add a test case?
```
Thanks for your reply Xiao, we encountered some difficulties during the 
test case, cause this need mock on speculative behavior. We will keep looking 
into this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22149
  
**[Test build #95323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95323/testReport)**
 for PR 22149 at commit 
[`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22149
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95315/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22208
  
**[Test build #95315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95315/testReport)**
 for PR 22208 at commit 
[`a8a5976`](https://github.com/apache/spark/commit/a8a59760228d4fac54175caeffdfe07faf26a184).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code i...

2018-08-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22246#discussion_r213164929
  
--- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -124,6 +141,26 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 super.replay()
   }
 
+  /**
+   * TODO: Remove `runClosure` when the support of Scala 2.11 is ended
+   * In Scala 2.12, we don't need to use `savingContextLoader` to execute 
the `body`.
+   * See `SI-8521 No blind save of context class loader` for detail.
+   */
+  private def runClosure(body: () => Boolean): Boolean = {
+if (isScala2_11) {
+  val loader = 
Utils.classForName("scala.reflect.internal.util.ScalaClassLoader$")
+.getDeclaredField("MODULE$")
+.get(null)
--- End diff --

Although it is a static method in Scala, it will be compiled to a 
non-static method in Java class. This class has a static member of the same 
type. The access of all static methods are through the static member.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...

2018-08-27 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/22202
  
Thanks for ping~
Seems that `ShuffleMapTask0.1` is a speculation, please update the 
description.
The change seems fine for me. But give 
https://github.com/apache/spark/pull/21019, the issue in description is already 
solved. I think this change is a refine work for 
https://github.com/apache/spark/pull/21019. Fine for me. But we should always 
be careful when touching such core logic



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22149
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22149
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95320/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22149
  
**[Test build #95320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95320/testReport)**
 for PR 22149 at commit 
[`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21977
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95313/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21977
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21977
  
**[Test build #95313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95313/testReport)**
 for PR 21977 at commit 
[`0b275cf`](https://github.com/apache/spark/commit/0b275cfea7d83cdf61802da30c4a7604be8900e4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

2018-08-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22112#discussion_r213160753
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1865,6 +1871,62 @@ abstract class RDD[T: ClassTag](
   // RDD chain.
   @transient protected lazy val isBarrier_ : Boolean =
 dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _, 
_]]).exists(_.rdd.isBarrier())
+
+  /**
+   * Returns the random level of this RDD's output. Please refer to 
[[RandomLevel]] for the
+   * definition.
+   *
+   * By default, an reliably checkpointed RDD, or RDD without parents(root 
RDD) is IDEMPOTENT. For
+   * RDDs with parents, we will generate a random level candidate per 
parent according to the
+   * dependency. The random level of the current RDD is the random level 
candidate that is random
+   * most. Please override [[getOutputRandomLevel]] to provide custom 
logic of calculating output
+   * random level.
+   */
+  // TODO: make it public so users can set random level to their custom 
RDDs.
+  // TODO: this can be per-partition. e.g. UnionRDD can have different 
random level for different
+  // partitions.
+  private[spark] final lazy val outputRandomLevel: RandomLevel.Value = {
+if 
(checkpointData.exists(_.isInstanceOf[ReliableRDDCheckpointData[_]])) {
--- End diff --

Ah good to know it. Then we can simplify the code here, and only check 
`checkpointRDD`. cc @mridulm 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r213160007
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2062,8 +2062,10 @@ private[spark] object Utils extends Logging {
 try {
   val properties = new Properties()
   properties.load(inReader)
-  properties.stringPropertyNames().asScala.map(
-k => (k, properties.getProperty(k).trim)).toMap
+  properties.stringPropertyNames().asScala
+.map(k => (k, properties.getProperty(k)))
--- End diff --

>trim removes leading spaces as well that are totally legit.

It is hard to say which solution is legit, the way you proposed may be 
valid in your case, but it will be unexpected in other user's case. I'm not 
talking about legit or not, what I'm trying to say is that your proposal will 
break the convention, that's what I concerned about.

By ASCII I'm you can pass in ASCII number, and translate to actual char in 
the code, that will mitigate the problem here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213158510
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -815,6 +815,24 @@ class Dataset[T] private[sql](
 println(showString(numRows, truncate, vertical))
   // scalastyle:on println
 
+  /**
+   * Returns the default number of rows to show when the show function is 
called without
+   * a user specified max number of rows.
+   * @since 2.3.0
+   */
+  private def numberOfRowsToShow(): Int = {
+this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
--- End diff --

How about `spark.sql.defaultNumRowsInShow`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213158056
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 checkShowString(ds, expected)
   }
 
+
+  test("SPARK-2444git stat2 Show should follow 
spark.show.default.number.of.rows") {
+withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
+  val ds = (1 to 1000).toDS().as[Int].show
--- End diff --

I think its ok to check the output number of rows in show.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22162
  
ya, sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213157406
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -815,6 +815,24 @@ class Dataset[T] private[sql](
 println(showString(numRows, truncate, vertical))
   // scalastyle:on println
 
+  /**
+   * Returns the default number of rows to show when the show function is 
called without
+   * a user specified max number of rows.
+   * @since 2.3.0
+   */
+  private def numberOfRowsToShow(): Int = {
+this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22162
  
We should wait @AndrewKL for few days?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95310/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #95310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95310/testReport)**
 for PR 21546 at commit 
[`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213154483
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/string-functions.sql 
---
@@ -5,6 +5,10 @@ select format_string();
 -- A pipe operator for string concatenation
 select 'a' || 'b' || 'c';
 
+-- split function
+select split('aa1cc2ee', '[1-9]+', 2);
+select split('aa1cc2ee', '[1-9]+');
+
--- End diff --

Can you move these tests to the end of this file in order to decrease 
unnecessary changes in the golden file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22198
  
**[Test build #95322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)**
 for PR 22198 at commit 
[`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2597/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22162
  
I have much bandwidh to take it, too. Is it ok to take it over? @mgaido91 
not working on this now? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21976
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21976
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95307/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread NiharS

Github user NiharS commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r213150133
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -130,6 +130,16 @@ private[spark] class Executor(
   private val urlClassLoader = createClassLoader()
   private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)
 
+  // One thread will handle loading all of the plugins on this executor
--- End diff --

That does make sense. While I did say "aside from semantics", semantics is 
a good reason to include it. Especially since it'll be harder to get plugin 
writers to adopt an `init` function later. I'll make the other changes and make 
sure the tests still pass, if anyone does feel strongly (or even weakly) on one 
way over another I don't think there's much harm in either approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21976
  
**[Test build #95307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95307/testReport)**
 for PR 21976 at commit 
[`e384245`](https://github.com/apache/spark/commit/e384245f7b0c6c43e6e0e0f7b73528b5c355e2f1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22238
  
**[Test build #95321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)**
 for PR 22238 at commit 
[`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22149
  
**[Test build #95320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95320/testReport)**
 for PR 22149 at commit 
[`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-27 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/22210
  
LGTM! Good catches


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22149
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22149
  
Is that possible to add a test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 537 matches

Mail list logo