[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...
Github user raajay commented on the issue: https://github.com/apache/spark/pull/18690 I understand. My previous comment was just a clarification to your question: "I'm not sure how does this code work in your changes?". I will close this PR. The JIRA is already closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...
Github user raajay closed the pull request at: https://github.com/apache/spark/pull/18690 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...
Github user raajay commented on the issue: https://github.com/apache/spark/pull/18690 @jerryshao My CustomSInk has the report function defined. What I did not have was an equivalent of JmxReporter defined in my CustomSink. The reporter essentially periodically invokes the report function defined in CustomSink --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...
Github user raajay commented on the issue: https://github.com/apache/spark/pull/18690 We were using a custom sink rather than the JmxSink for gathering metrics. The sink did NOT have a "reporter" like the ones JmxSink or CsvSink have. I guess a cleaner design is to implement a metrics reporter in the Sink and not have a reporting service as part of external shuffle service. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18683: [SPARK-21474][CORE] Make number of parallel fetches from...
Github user raajay commented on the issue: https://github.com/apache/spark/pull/18683 maxSizeInFlight can be large (~100-200 MB) when (a) available memory at reducer is high, or (b) when reducer spends most of its time waiting for fetchRequests. In such cases, using a hard coded value of '5' parallel fetches will result in individual fetchRequests to be bursts of size 20-40 MB. The configuration parameter, allows one have smaller sized fetchRequests while keeping the total maxSizeInFlight a constant. This configuration is helpful when the reducer spends most of its time waiting for fetchRequests to return. In such cases, we would like to increase the maxBytesInFlight --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...
GitHub user raajay opened a pull request: https://github.com/apache/spark/pull/18690 [SPARK-21334][CORE] Add metrics reporting service to External Shuffle Server ## What changes were proposed in this pull request? Add a metrics reporting service, that periodically reports the metrics defined in ExternalShuffleServiceSource. Currently, although the metrics are defined, they are never reported. ## How was this patch tested? Manual tests to ensure that metrics are reported on ExternalShuffleService start. You can merge this pull request into a Git repository by running: $ git pull https://github.com/raajay/spark raajay-launch-metric-reporting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18690.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18690 commit 4de1658f2dea72ded4c86d699f90432b8d965370 Author: Raajay Viswanathan Date: 2017-07-20T17:21:23Z Add metrics reporting service. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18683: [SPARK-21474][CORE] Make number of parallel fetch...
GitHub user raajay opened a pull request: https://github.com/apache/spark/pull/18683 [SPARK-21474][CORE] Make number of parallel fetches from a reducer configurable ## What changes were proposed in this pull request? Currently the number of parallel fetches is hard-coded to 5. As a result the size of each fetch request is fixed at 1/5th of maxSizeInFlights. Since, chunks are requested in bursts of fetchRequests; the size of a burst to a single shuffle service can be high for large maxSizeInFlight Introduce a new configuration parameter "spark.reducer.numParallelFetchRequets" to make it configurable. ## How was this patch tested? Not tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/raajay/spark raajay-configure-parallel-requests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18683.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18683 commit 9bf1f9fd5aa08ef19653a4cfa52a8eccc64dc18b Author: Raajay Viswanathan Date: 2017-07-19T16:49:25Z Make num parallel fetches from a reducer configurable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org