[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/22236 just FYI about another related PR: https://github.com/apache/spark/pull/17280 and maybe I should close it? @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213187429 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -91,6 +91,13 @@ private[spark] class Client( private val executorMemoryOverhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt + private val isPython = sparkConf.get(IS_PYTHON_APP) --- End diff -- Sure, one of them is https://github.com/sparklingpandas/sparklingml --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213186832 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -161,6 +162,11 @@ abstract class BaseYarnClusterSuite } extraJars.foreach(launcher.addJar) +if (outFile.isDefined) { --- End diff -- I think the pattern match would be better than the get. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22149 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95323/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22149 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22149 **[Test build #95323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95323/testReport)** for PR 22149 at commit [`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22247#discussion_r213181568 --- Diff: python/pyspark/worker.py --- @@ -364,8 +364,5 @@ def process(): # Read information about how to connect back to the JVM from the environment. java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"]) auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"] -sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) -sock.connect(("127.0.0.1", java_port)) -sock_file = sock.makefile("rwb", 65536) --- End diff -- I quickly tested and seems working fine. Please ignore this comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22238 **[Test build #95326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95326/testReport)** for PR 22238 at commit [`e2ee43d`](https://github.com/apache/spark/commit/e2ee43da2f9bf4fb95c938764ee3584bbae06c1b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213181165 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information +**Notes** + +- There're couple of configurations which are not modifiable once you run the query. If you really want to make changes for these configurations, you have to discard checkpoint and start a new query. + - `spark.sql.shuffle.partitions` +- This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged. +- If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning. + - e.g. `df.groupBy("time").count().coalesce(10)` reduces the number of tasks by 10, whereas `spark.sql.shuffle.partitions` may be bigger. + - After `coalesce`, the number of (reduced) tasks will be kept unless another shuffle happens. + - `spark.sql.streaming.stateStore.providerClass` --- End diff -- Ah, okay, so there are more instances to describe here. If so, im okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213181040 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information +**Notes** --- End diff -- I was thinking adding this information somewhere API or configuration only. For instance, notes like https://github.com/apache/spark/pull/19617. > lots of wondering around SO and user mailing list, I don't object to note these stuff but usually the site has only key points for some features or configurations. If there are more instance to describe specifically for structured streaming (where the same SQL configurations could lead to some confusions), I am fine with adding this. If not or less sure for now, I would add them into API's doc or configuration's doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213180977 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information -**Further Reading** +## Configuration Options For Structured Streaming + +This section is for configurations which are only available for structured streaming, or they behave differently with batch query. + +- spark.sql.shuffle.partitions --- End diff -- I was thinking adding this information somewhere API or configuration only. For instance, notes like https://github.com/apache/spark/pull/19617. > lots of wondering around SO and user mailing list, I don't object to note these stuff but usually the site has only key points for some features or configurations. If there are more instance to describe specifically for structured streaming (where the same SQL configurations could lead to some confusions), I am fine with adding this. If not or less sure for now, I would add them into API's doc or configuration's doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95322/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22198 **[Test build #95322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)** for PR 22198 at commit [`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213179259 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information -**Further Reading** +## Configuration Options For Structured Streaming + +This section is for configurations which are only available for structured streaming, or they behave differently with batch query. + +- spark.sql.shuffle.partitions --- End diff -- I can revert adding a new section if you meant adding `##` on it. While gotcha looks more like funny, I will change it to `**Notes**`. The rationalization on adding this to doc is, this restriction had been making lots of wondering around SO and user mailing list, as well as even a patch for fixing this. So all of end users who use structured streaming would be nice to see it at least once, even they skim the doc, so that they can remember and revisit the doc once they get stuck on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22243: [MINOR] Avoid code duplication for nullable in Hi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22243#discussion_r213178417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -155,6 +155,8 @@ trait HigherOrderFunction extends Expression with ExpectsInputTypes { */ trait SimpleHigherOrderFunction extends HigherOrderFunction { + override def nullable: Boolean = argument.nullable --- End diff -- Yea, let's go ahead then if the change is small, straightforward and more deduplciation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22210 Seems okay to me too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95321/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213178020 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information -**Further Reading** +## Configuration Options For Structured Streaming + +This section is for configurations which are only available for structured streaming, or they behave differently with batch query. + +- spark.sql.shuffle.partitions --- End diff -- What I am worried is about adding a new section, which is quite unusual. Usually we go for it when multiple instances are detected later. Are there more instance to describe here specifically for structured streaming? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22238 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22238 **[Test build #95321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)** for PR 22238 at commit [`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r213177782 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -613,8 +613,7 @@ case class JsonToStructs( } /** - * Converts a [[StructType]], [[ArrayType]] of [[StructType]]s, [[MapType]] - * or [[ArrayType]] of [[MapType]]s to a json output string. + * Converts a [[StructType]], [[ArrayType]] or [[MapType]] to a json output string. --- End diff -- not a big deal but `JSON` while we are here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/6 Seems okay but I or someone else should take a closer look before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r213177790 --- Diff: python/pyspark/sql/functions.py --- @@ -2289,12 +2289,10 @@ def from_json(col, schema, options={}): @since(2.1) def to_json(col, options={}): """ -Converts a column containing a :class:`StructType`, :class:`ArrayType` of -:class:`StructType`\\s, a :class:`MapType` or :class:`ArrayType` of :class:`MapType`\\s +Converts a column containing a :class:`StructType`, :class:`ArrayType` or a :class:`MapType` into a JSON string. Throws an exception, in the case of an unsupported type. -:param col: name of column containing the struct, array of the structs, the map or -array of the maps. +:param col: name of column containing a struct, an array or a map. :param options: options to control converting. accepts the same options as the json datasource --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213177623 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information -**Further Reading** +## Configuration Options For Structured Streaming + +This section is for configurations which are only available for structured streaming, or they behave differently with batch query. + +- spark.sql.shuffle.partitions --- End diff -- IMHO, if something goes wrong with structured streaming, end users would try to review structured streaming guide doc, rather than sql programming guide doc. Could we wait for hearing more voices on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95311/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21976: [SPARK-24909][core] Always unregister pending par...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21976#discussion_r213176636 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2474,19 +2478,21 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi runEvent(makeCompletionEvent( taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2))) -// There should be no new attempt of stage submitted, -// because task(stageId=1, stageAttempt=1, partitionId=1) is still running in -// the current attempt (and hasn't completed successfully in any earlier attempts). -assert(taskSets.size === 4) +// At this point there should be no active task set for stageId=1 and we need +// to resubmit because the output from (stageId=1, stageAttemptId=0, partitionId=1) +// was ignored due to executor failure +assert(taskSets.size === 5) +assert(taskSets(4).stageId === 1 && taskSets(4).stageAttemptId === 2 + && taskSets(4).tasks.size === 1) -// Complete task(stageId=1, stageAttempt=1, partitionId=1) successfully. +// Complete task(stageId=1, stageAttempt=2, partitionId=1) successfully. runEvent(makeCompletionEvent( - taskSets(3).tasks(1), Success, makeMapStatus("hostB", 2))) + taskSets(4).tasks(0), Success, makeMapStatus("hostB", 2))) --- End diff -- Yea thanks for explanation, BTW what's the jira number of the ongoing scheduler integration test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213176616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- Can't we just use `expr` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95311/testReport)** for PR 7 at commit [`4e10733`](https://github.com/apache/spark/commit/4e107337a47ce590c703b757b0a44d60d6b862e1). * This patch **fails from timeout after a configured wait of \`400m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r213176423 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,49 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied. If the limit " + +"n is greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"less than 0, then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", --- End diff -- +1 for https://github.com/apache/spark/pull/7#discussion_r212815685. The doc should better be concise. Can we just move those `limit` specific description into the arguments at `limit - a..`? This looks a bit messy. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22238#discussion_r213176029 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f # Additional Information -**Further Reading** +## Configuration Options For Structured Streaming + +This section is for configurations which are only available for structured streaming, or they behave differently with batch query. + +- spark.sql.shuffle.partitions --- End diff -- We do have it in `sql-programming-guide.md`. Shall we add some info there for now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22247#discussion_r213175296 --- Diff: python/pyspark/worker.py --- @@ -364,8 +364,5 @@ def process(): # Read information about how to connect back to the JVM from the environment. java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"]) auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"] -sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) -sock.connect(("127.0.0.1", java_port)) -sock_file = sock.makefile("rwb", 65536) --- End diff -- @vanzin, BTW, did you test this on Windows too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22247#discussion_r213174542 --- Diff: python/pyspark/java_gateway.py --- @@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret): raise Exception("Unexpected reply from iterator server.") +def local_connect_and_auth(sock_info): +""" +Connect to local host, authenticate with it, and return a (sockfile,sock) for that connection. +Handles IPV4 & IPV6, does some error handling. +:param sock_info: a tuple of (port, auth_secret) for connecting +:return: a tuple with (sockfile, sock) +""" +port, auth_secret = sock_info +sock = None +errors = [] +# Support for both IPv4 and IPv6. +# On most of IPv6-ready systems, IPv6 will take precedence. +for res in socket.getaddrinfo("127.0.0.1", port, socket.AF_UNSPEC, socket.SOCK_STREAM): +af, socktype, proto, canonname, sa = res --- End diff -- nit: `af, socktype, proto, canonname, sa = res` -> `af, socktype, proto, _, sa = res` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22247#discussion_r213173845 --- Diff: python/pyspark/java_gateway.py --- @@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret): raise Exception("Unexpected reply from iterator server.") +def local_connect_and_auth(sock_info): --- End diff -- @squito, not a big deal but how about `local_connect_and_auth (port, auth_secret)` and .. ```python (sockfile, sock) = local_connect_and_auth(port, auth_secret) ``` ```python (sock_file, _) = local_connect_and_auth(java_port, auth_secret) ``` ```python port, auth_secret = sock_info (sockfile, sock) = local_connect_and_auth(port, auth_secret) ``` or ```python (sockfile, sock) = local_connect_and_auth(*sock_info) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22238 Also adding @tdas @zsxwing @jose-torres to cc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22246 > @viirya The reflection trick we use in scala.reflect.internal.util.ScalaClassLoader doesn't work when the REPL is called from test. Do you have any idea about it? Thanks. Yeah, it seems due to classloader. After changed to Spark classloader, the tests were passed locally. Let's see if Jenkins tests pass too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22104 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95312/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #95325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95325/testReport)** for PR 21546 at commit [`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2599/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22104 **[Test build #95312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95312/testReport)** for PR 22104 at commit [`3f0a97a`](https://github.com/apache/spark/commit/3f0a97a89b39d2ad57c587e49bb07203a670faba). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21546 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22104 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22104 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22244 closing in favor of https://github.com/apache/spark/pull/22104 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract pyth...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/22244 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22104 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17280 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17280 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95318/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17280 **[Test build #95318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95318/testReport)** for PR 17280 at commit [`733c7ff`](https://github.com/apache/spark/commit/733c7ff70c46f0c54cdf520b44645544b810e04e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2598/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22246 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22246 **[Test build #95324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95324/testReport)** for PR 22246 at commit [`e0d424d`](https://github.com/apache/spark/commit/e0d424d645010108a497c057fa4ad1e198f1e3d0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95314/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22236 **[Test build #95314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95314/testReport)** for PR 22236 at commit [`88eb571`](https://github.com/apache/spark/commit/88eb571b732d42138b029ead106f4c8718e1e220). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95317/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22104 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22104 **[Test build #95317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95317/testReport)** for PR 22104 at commit [`2325a4f`](https://github.com/apache/spark/commit/2325a4f18a2bc6cc95d96bc5ac6790749b3e927e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22164: [SPARK-23679][YARN] Setting RM_HA_URLS for AmIpFi...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22164#discussion_r213168025 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala --- @@ -126,4 +136,21 @@ private[spark] class YarnRMClient extends Logging { } } + private def getUrlByRmId(conf: Configuration, rmId: String): String = { --- End diff -- For the Spark usage, I think it may not be so useful to use `AmFilterInitializer`, because we need to pass the filter parameters to driver either from RPC (client mode) or from configuration (cluster mode), in either way we should know how to set each parameter, so from my understanding using `AmFilterInitializer` seems not so useful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22149 ``` Is that possible to add a test case? ``` Thanks for your reply Xiao, we encountered some difficulties during the test case, cause this need mock on speculative behavior. We will keep looking into this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22149 **[Test build #95323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95323/testReport)** for PR 22149 at commit [`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22149 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95315/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22208 **[Test build #95315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95315/testReport)** for PR 22208 at commit [`a8a5976`](https://github.com/apache/spark/commit/a8a59760228d4fac54175caeffdfe07faf26a184). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code i...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22246#discussion_r213164929 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -124,6 +141,26 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) super.replay() } + /** + * TODO: Remove `runClosure` when the support of Scala 2.11 is ended + * In Scala 2.12, we don't need to use `savingContextLoader` to execute the `body`. + * See `SI-8521 No blind save of context class loader` for detail. + */ + private def runClosure(body: () => Boolean): Boolean = { +if (isScala2_11) { + val loader = Utils.classForName("scala.reflect.internal.util.ScalaClassLoader$") +.getDeclaredField("MODULE$") +.get(null) --- End diff -- Although it is a static method in Scala, it will be compiled to a non-static method in Java class. This class has a static member of the same type. The access of all static methods are through the static member. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22202: [SPARK-25211][Core] speculation and fetch failed result ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/22202 Thanks for ping~ Seems that `ShuffleMapTask0.1` is a speculation, please update the description. The change seems fine for me. But give https://github.com/apache/spark/pull/21019, the issue in description is already solved. I think this change is a refine work for https://github.com/apache/spark/pull/21019. Fine for me. But we should always be careful when touching such core logic --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22149 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22149 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95320/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22149 **[Test build #95320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95320/testReport)** for PR 22149 at commit [`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95313/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21977 **[Test build #95313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95313/testReport)** for PR 21977 at commit [`0b275cf`](https://github.com/apache/spark/commit/0b275cfea7d83cdf61802da30c4a7604be8900e4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r213160753 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1865,6 +1871,62 @@ abstract class RDD[T: ClassTag]( // RDD chain. @transient protected lazy val isBarrier_ : Boolean = dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _, _]]).exists(_.rdd.isBarrier()) + + /** + * Returns the random level of this RDD's output. Please refer to [[RandomLevel]] for the + * definition. + * + * By default, an reliably checkpointed RDD, or RDD without parents(root RDD) is IDEMPOTENT. For + * RDDs with parents, we will generate a random level candidate per parent according to the + * dependency. The random level of the current RDD is the random level candidate that is random + * most. Please override [[getOutputRandomLevel]] to provide custom logic of calculating output + * random level. + */ + // TODO: make it public so users can set random level to their custom RDDs. + // TODO: this can be per-partition. e.g. UnionRDD can have different random level for different + // partitions. + private[spark] final lazy val outputRandomLevel: RandomLevel.Value = { +if (checkpointData.exists(_.isInstanceOf[ReliableRDDCheckpointData[_]])) { --- End diff -- Ah good to know it. Then we can simplify the code here, and only check `checkpointRDD`. cc @mridulm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r213160007 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2062,8 +2062,10 @@ private[spark] object Utils extends Logging { try { val properties = new Properties() properties.load(inReader) - properties.stringPropertyNames().asScala.map( -k => (k, properties.getProperty(k).trim)).toMap + properties.stringPropertyNames().asScala +.map(k => (k, properties.getProperty(k))) --- End diff -- >trim removes leading spaces as well that are totally legit. It is hard to say which solution is legit, the way you proposed may be valid in your case, but it will be unexpected in other user's case. I'm not talking about legit or not, what I'm trying to say is that your proposal will break the convention, that's what I concerned about. By ASCII I'm you can pass in ASCII number, and translate to actual char in the code, that will mitigate the problem here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213158510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- How about `spark.sql.defaultNumRowsInShow`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213158056 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + val ds = (1 to 1000).toDS().as[Int].show --- End diff -- I think its ok to check the output number of rows in show. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22162 ya, sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213157406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22162 We should wait @AndrewKL for few days? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95310/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #95310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95310/testReport)** for PR 21546 at commit [`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r213154483 --- Diff: sql/core/src/test/resources/sql-tests/inputs/string-functions.sql --- @@ -5,6 +5,10 @@ select format_string(); -- A pipe operator for string concatenation select 'a' || 'b' || 'c'; +-- split function +select split('aa1cc2ee', '[1-9]+', 2); +select split('aa1cc2ee', '[1-9]+'); + --- End diff -- Can you move these tests to the end of this file in order to decrease unnecessary changes in the golden file. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22198 **[Test build #95322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)** for PR 22198 at commit [`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2597/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22162 I have much bandwidh to take it, too. Is it ok to take it over? @mgaido91 not working on this now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21976 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21976 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95307/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r213150133 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -130,6 +130,16 @@ private[spark] class Executor( private val urlClassLoader = createClassLoader() private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader) + // One thread will handle loading all of the plugins on this executor --- End diff -- That does make sense. While I did say "aside from semantics", semantics is a good reason to include it. Especially since it'll be harder to get plugin writers to adopt an `init` function later. I'll make the other changes and make sure the tests still pass, if anyone does feel strongly (or even weakly) on one way over another I don't think there's much harm in either approach. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21976 **[Test build #95307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95307/testReport)** for PR 21976 at commit [`e384245`](https://github.com/apache/spark/commit/e384245f7b0c6c43e6e0e0f7b73528b5c355e2f1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22238 **[Test build #95321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)** for PR 22238 at commit [`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22149 **[Test build #95320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95320/testReport)** for PR 22149 at commit [`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/22210 LGTM! Good catches --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22149 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22149 Is that possible to add a test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org