[GitHub] spark pull request #21985: [SPARK-24884][SQL] add regexp_extract_all support
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21985 [SPARK-24884][SQL] add regexp_extract_all support ## What changes were proposed in this pull request? This PR add regexp_extract_all support in catalyst as RegExpExtractAll. It finds all occurrences of the regular expression pattern in string and returns the capturing group number ## How was this patch tested? unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark RegExpExtractAll Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21985.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21985 commit 2a9623879d91a9b7f33e1f4d252b8633de2c9e8b Author: xueyu <278006819@...> Date: 2018-08-03T13:22:14Z RegExpExtractAll --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21624: [SPARK-24639][DOC] Add three config in the doc
Github user xueyumusic commented on a diff in the pull request: https://github.com/apache/spark/pull/21624#discussion_r199145469 --- Diff: docs/configuration.md --- @@ -456,6 +456,13 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.python.task.killTimeout + 2s + +How long to wait before killing the python worker if a task cannot be interrupted. --- End diff -- updated and fix confilct, please have a review, thanks, @zsxwing @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21575: [SPARK-24566][CORE] spark.storage.blockManagerSla...
Github user xueyumusic commented on a diff in the pull request: https://github.com/apache/spark/pull/21575#discussion_r199143954 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -74,17 +75,17 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // "spark.network.timeout" uses "seconds", while `spark.storage.blockManagerSlaveTimeoutMs` uses // "milliseconds" - private val slaveTimeoutMs = -sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") private val executorTimeoutMs = -sc.conf.getTimeAsSeconds("spark.network.timeout", s"${slaveTimeoutMs}ms") * 1000 +sc.conf.getTimeAsSeconds("spark.network.timeout", --- End diff -- updated, please have a review, thank you @zsxwing @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21567: [SPARK-24560][CORE][MESOS] Fix some getTimeAsMs a...
Github user xueyumusic closed the pull request at: https://github.com/apache/spark/pull/21567 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21624: [SPARK-24639][DOC] Add three config in the doc
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21624 [SPARK-24639][DOC] Add three config in the doc ## What changes were proposed in this pull request? add three config which are mentioned in the pr #21567 , they are `spark.python.task.killTimeout`, `spark.worker.driverTerminateTimeout` and `spark.ui.consoleProgress.update.interval` ## How was this patch tested? doc build You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark addconfig1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21624 commit 16c256d8206123df9487c57c6779ad2b0e0211d0 Author: xueyu <278006819@...> Date: 2018-06-23T13:49:37Z add some configs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21575: [SPARK-24566][CORE] spark.storage.blockManagerSla...
Github user xueyumusic commented on a diff in the pull request: https://github.com/apache/spark/pull/21575#discussion_r197028516 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -75,16 +76,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // "spark.network.timeout" uses "seconds", while `spark.storage.blockManagerSlaveTimeoutMs` uses // "milliseconds" private val slaveTimeoutMs = -sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") +sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", --- End diff -- I have removed temp val `slaveTimeout`, also `timeoutIntervalMs` is the same case, so removed too, thanks @zsxwing @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21575: [SPARK-24566][CORE] spark.storage.blockManagerSla...
Github user xueyumusic commented on a diff in the pull request: https://github.com/apache/spark/pull/21575#discussion_r196635402 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -75,16 +76,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // "spark.network.timeout" uses "seconds", while `spark.storage.blockManagerSlaveTimeoutMs` uses // "milliseconds" private val slaveTimeoutMs = -sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", "120s") +sc.conf.getTimeAsMs("spark.storage.blockManagerSlaveTimeoutMs", --- End diff -- I look at this carefully, I think your are right, thanks @jiangxb1987 . One case that is not relevant with this PR is like this: set spark.storage.blockManagerSlaveTimeoutMs=900ms and not configure spark.network.timeout, then `executorTimeoutMs ` will be 0 since getTimeAsSeconds loos precision for ms. This config maybe not reasonable. If need fix how about add ensuring > 0 or make executorTimeoutMs's min value as 1, @jiangxb1987 @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][CORE][MESOS] Fix some getTimeAsMs as getTi...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21567 I see, thanks for your review and guidance, @jiangxb1987 @maropu , I will try to add related config to doc and close this PR, thank you --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21575: [SPARK-24566][CORE] spark.storage.blockManagerSlaveTimeo...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21575 I added the tests, thanks @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21575: [SPARK-24566][CORE] spark.storage.blockManagerSlaveTimeo...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21575 It seems that "spark.core.connection.ack.wait.timeout" and "spark.shuffle.io.connectionTimeout" are used only in tests which might be legacy and do not have an impact on normal code, and "spark.rpc.lookupTimeout" don't have the same issue. The only one for "spark.rpc.askTimeout" which I am not sure whether it is an issue is https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/Client.scala#L229. I am not sure whether it is a special case that force this config 10s when not configured --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21575: [SPARK-24566][CORE] spark.storage.blockManagerSlaveTimeo...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21575 I have made the modification, @maropu please review the code, thank you --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][CORE][MESOS] Fix some getTimeAsMs as getTi...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21567 I have made some modification, @maropu please review the code, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21575: spark.storage.blockManagerSlaveTimeoutMs default ...
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21575 spark.storage.blockManagerSlaveTimeoutMs default config ## What changes were proposed in this pull request? This PR use spark.network.timeout in place of spark.storage.blockManagerSlaveTimeoutMs when it is not configured, as configuration doc said ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark slaveTimeOutConfig Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21575 commit f5943410efd2f8f0cc82493eee5c5a4c30f7ebe3 Author: xueyu <278006819@...> Date: 2018-06-15T05:32:33Z blockManagerSlaveTimeoutMs default config --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21567: [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as ...
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21567 [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as getTimeAsSeconds ## What changes were proposed in this pull request? This PR replaces some "getTimeAsMs" with "getTimeAsSeconds". This will return a wrong value when the user specifies a value without a time unit. ## How was this patch tested? manual test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark fixGetTimeAs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21567 commit 10bf41ec86c0af59a791fa02b5efaedc7a164a3c Author: xueyu <278006819@...> Date: 2018-06-14T11:01:29Z fix getTimeAs method --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl commen...
Github user xueyumusic commented on the issue: https://github.com/apache/spark/pull/21485 I took another look and find some typos, please review them, @HyukjinKwon , thank you for reminding --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21485: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl...
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21485 [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment change runTasks to submitTasks in the TaskSchedulerImpl.scala 's comment You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark fixtypo1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21485 commit 97df135e7af26191cbbe3e5c54afe79d94aa43f8 Author: xueyu Date: 2018-06-02T07:09:18Z fix typo in TaskSchedulerImpl --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org