[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448271#comment-17448271 ] Yongjun Zhang edited comment on SPARK-31646 at 11/23/21, 10:47 PM: --- Thanks [~mauzhang] What I observed is most of the time numRegisteredConenctions is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveConnections range between 0 - 3.6k. was (Author: yzhangal): Thanks [~mauzhang] What I observed is most of the time numRegisteredConenctions is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveRegisteredConnections range between 0 - 3.6k. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25642) Add new Metrics in External Shuffle Service to help determine Network performance and Connection Handling capabilities of the Shuffle Service
[ https://issues.apache.org/jira/browse/SPARK-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448272#comment-17448272 ] Yongjun Zhang edited comment on SPARK-25642 at 11/23/21, 10:46 PM: --- Thanks [~pgandhi] . What I observed is most of the time numRegisteredConnections is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveConnections range between 0 - 3.6k. Wonder if anyone else who look at these metrics have different observations. was (Author: yzhangal): Thanks [~pgandhi] . What I observed is most of the time numRegisteredConnections is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveRegisteredConnections range between 0 - 3.6k. Wonder if anyone else who look at these metrics have different observations. > Add new Metrics in External Shuffle Service to help determine Network > performance and Connection Handling capabilities of the Shuffle Service > - > > Key: SPARK-25642 > URL: https://issues.apache.org/jira/browse/SPARK-25642 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.4.0 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Minor > Fix For: 3.0.0 > > > Recently, the ability to expose the metrics for YARN Shuffle Service was > added as part of [SPARK-18364|[https://github.com/apache/spark/pull/22485]]. > We need to add some metrics to be able to determine the number of active > connections as well as open connections to the external shuffle service to > benchmark network and connection issues on large cluster environments. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25642) Add new Metrics in External Shuffle Service to help determine Network performance and Connection Handling capabilities of the Shuffle Service
[ https://issues.apache.org/jira/browse/SPARK-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448272#comment-17448272 ] Yongjun Zhang commented on SPARK-25642: --- Thanks [~pgandhi] . What I observed is most of the time numRegisteredConnections is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveRegisteredConnections range between 0 - 3.6k. Wonder if anyone else who look at these metrics have different observations. > Add new Metrics in External Shuffle Service to help determine Network > performance and Connection Handling capabilities of the Shuffle Service > - > > Key: SPARK-25642 > URL: https://issues.apache.org/jira/browse/SPARK-25642 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.4.0 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Minor > Fix For: 3.0.0 > > > Recently, the ability to expose the metrics for YARN Shuffle Service was > added as part of [SPARK-18364|[https://github.com/apache/spark/pull/22485]]. > We need to add some metrics to be able to determine the number of active > connections as well as open connections to the external shuffle service to > benchmark network and connection issues on large cluster environments. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448271#comment-17448271 ] Yongjun Zhang commented on SPARK-31646: --- Thanks [~mauzhang] What I observed is most of the time numRegisteredConenctions is smaller than numActiveConnections. For example, in one case the gap can be as large as beyond 3k. while numRegisteredConnections range between 0 - 2.5k, and numActiveRegisteredConnections range between 0 - 3.6k. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444173#comment-17444173 ] Yongjun Zhang edited comment on SPARK-31646 at 11/15/21, 11:17 PM: --- HI [~mauzhang] , wonder if you have been monitoring the metrics activeConnections and registeredConnections, somehow I observed registeredConnections is smaller than activeConnections, I thought it should be the opposite. I also asked here: https://issues.apache.org/jira/browse/SPARK-25642?focusedCommentId=17442924&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17442924 Thanks. was (Author: yzhangal): HI [~mauzhang] , wonder if you have been monitoring the metrics activeConnections and registeredConnections, somehow I observed registeredConnections is smaller than activeConenctions, I thought it should be the opposite. I also asked here: https://issues.apache.org/jira/browse/SPARK-25642?focusedCommentId=17442924&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17442924 Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444173#comment-17444173 ] Yongjun Zhang commented on SPARK-31646: --- HI [~mauzhang] , wonder if you have been monitoring the metrics activeConnections and registeredConnections, somehow I observed registeredConnections is smaller than activeConenctions, I thought it should be the opposite. I also asked here: https://issues.apache.org/jira/browse/SPARK-25642?focusedCommentId=17442924&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17442924 Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25642) Add new Metrics in External Shuffle Service to help determine Network performance and Connection Handling capabilities of the Shuffle Service
[ https://issues.apache.org/jira/browse/SPARK-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442924#comment-17442924 ] Yongjun Zhang commented on SPARK-25642: --- Hi [~pgandhi] , thanks for your work here. I thought numActiveConnections is a subset of numRegisteredConnections thus the former should be smaller than the latter, somehow we are observing the former is larger than the latter. Wonder if my understanding is correct or if you had similar observation? thanks. > Add new Metrics in External Shuffle Service to help determine Network > performance and Connection Handling capabilities of the Shuffle Service > - > > Key: SPARK-25642 > URL: https://issues.apache.org/jira/browse/SPARK-25642 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.4.0 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Minor > Fix For: 3.0.0 > > > Recently, the ability to expose the metrics for YARN Shuffle Service was > added as part of [SPARK-18364|[https://github.com/apache/spark/pull/22485]]. > We need to add some metrics to be able to determine the number of active > connections as well as open connections to the external shuffle service to > benchmark network and connection issues on large cluster environments. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420870#comment-17420870 ] Yongjun Zhang commented on SPARK-31646: --- Thanks [~mauzhang]. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418860#comment-17418860 ] Yongjun Zhang commented on SPARK-31646: --- HI [~mauzhang], BTW, in order ro do something like {quote} You may register your backloggedConnections in ShuffleMetrics and update it with "registeredConenctions - activeConnections" in ShuffleMetrics#getMetrics. {quote} because registeredConnections is channel level (TransportChannelHandler), and activeConnections is at RPC level (ExternalShuffleBlockHandler), one way to do it is create a counter for numBackloggedConnections outside somewhere, and pass it as a parameter to both TransportChannelHandler and ExternalShuffleBlockHandler, so that this counter can be updated by both. However, this would make the code a bit ugly. So I will try to derive numBackLoggedConnections outside at the metrics monitoring system. Any better suggestion? Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418366#comment-17418366 ] Yongjun Zhang edited comment on SPARK-31646 at 9/22/21, 1:04 AM: - HI [~mauzhang], Thanks a lot for your answers and sorry for late reply. I think I understand it better now why you are doing this change: the registeredConnections metrics added in ExternalShuffleBlockHandler was not used. However, the one added to TransportContext is used, see in YarnShuffleService.java: {code:java} // register metrics on the block handler into the Node Manager's metrics system. blockHandler.getAllMetrics().getMetrics().put("numRegisteredConnections", shuffleServer.getRegisteredConnections()); YarnShuffleServiceMetrics serviceMetrics = new YarnShuffleServiceMetrics(blockHandler.getAllMetrics()); MetricsSystemImpl metricsSystem = (MetricsSystemImpl) DefaultMetricsSystem.instance(); metricsSystem.register( "sparkShuffleService", "Metrics on the Spark Shuffle Service", serviceMetrics); logger.info("Registered metrics with Hadoop's DefaultMetricsSystem"); {code} The TransportContext version of registeredConnections is retrieved by "shuffleServer.getRegisteredConnections())" in the above code. That means both the activeConnections and registeredConnections are still available with your change. Is that your expectation? If my understanding is correct, we can either derive "registeredConnections - activeConnections" as the backlogged connections, or we can add a new metrics as backloggedConnection to have the value of "registeredConnections - activeConnections" . What do you think? Thanks! was (Author: yzhangal): HI [~mauzhang], Thanks a lot for your answers and sorry for late reply. I think I understand it better now why you are doing this change: the registeredConnections metrics added in ExternalShuffleBlockHandler was not used. However, the one added to TransportContext is used, see in YarnShuffleService.java: {code:java} // register metrics on the block handler into the Node Manager's metrics system. blockHandler.getAllMetrics().getMetrics().put("numRegisteredConnections", shuffleServer.getRegisteredConnections()); YarnShuffleServiceMetrics serviceMetrics = new YarnShuffleServiceMetrics(blockHandler.getAllMetrics()); MetricsSystemImpl metricsSystem = (MetricsSystemImpl) DefaultMetricsSystem.instance(); metricsSystem.register( "sparkShuffleService", "Metrics on the Spark Shuffle Service", serviceMetrics); logger.info("Registered metrics with Hadoop's DefaultMetricsSystem"); {code} The TransportContext version of registeredConnections is retrieved by "shuffleServer.getRegisteredConnections())" in the above code. That means both the activeConnections and registeredConnections are still available with your change. Is that your expectation? If my understanding is correct, we can either derive "registeredConnections - activeConnections" as the backlogged connections, or we can add a new metrics as backloggedConnection to have the value of "registeredConnections - activeConnections" . What do you think? Thanks! > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418366#comment-17418366 ] Yongjun Zhang commented on SPARK-31646: --- HI [~mauzhang], Thanks a lot for your answers and sorry for late reply. I think I understand it better now why you are doing this change: the registeredConnections metrics added in ExternalShuffleBlockHandler was not used. However, the one added to TransportContext is used, see in YarnShuffleService.java: {code:java} // register metrics on the block handler into the Node Manager's metrics system. blockHandler.getAllMetrics().getMetrics().put("numRegisteredConnections", shuffleServer.getRegisteredConnections()); YarnShuffleServiceMetrics serviceMetrics = new YarnShuffleServiceMetrics(blockHandler.getAllMetrics()); MetricsSystemImpl metricsSystem = (MetricsSystemImpl) DefaultMetricsSystem.instance(); metricsSystem.register( "sparkShuffleService", "Metrics on the Spark Shuffle Service", serviceMetrics); logger.info("Registered metrics with Hadoop's DefaultMetricsSystem"); {code} The TransportContext version of registeredConnections is retrieved by "shuffleServer.getRegisteredConnections())" in the above code. That means both the activeConnections and registeredConnections are still available with your change. Is that your expectation? If my understanding is correct, we can either derive "registeredConnections - activeConnections" as the backlogged connections, or we can add a new metrics as backloggedConnection to have the value of "registeredConnections - activeConnections" . What do you think? Thanks! > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416225#comment-17416225 ] Yongjun Zhang edited comment on SPARK-31646 at 9/16/21, 4:36 PM: - Thanks a lot for your quick response [~mauzhang]. Sorry my question was a bit not clear earlier. When you made the comment {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} , you meant to use registeredConnections to mean something different than what you reverted with this jira. Would you please explain # what's your interpretation of the reverted version of registeredConnections besides "counting numbers"? is it not useful at all? # what your definition is for "new metrics like {{registeredConnections}}."? is it not counting? if it's counting, why the reverted implementation doesn't serve the purpose? BTW, my understanding is that 1, all executors running on a given host would register with the remote shuffle service on the same host, 2, executors only register with shuffle service on the same host where the executors are running at, but not shuffle service running on other hosts. Is this understanding correct? Thanks. was (Author: yzhangal): Thanks a lot for your quick response [~mauzhang]. Sorry my question was a bit not clear earlier. When you made the comment {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} , you meant to use registeredConnections to mean something different than what you reverted with this jira. Would you please explain 1. what's your interpretation of the reverted version of registeredConnections besides "counting numbers"? is it not useful at all? 2. what your definition is for "new metrics like {{registeredConnections}}."? is it not counting? if it's counting, why the reverted implementation doesn't serve the purpose? BTW, my understanding is that 1, all executors running on a given host would register with the remote shuffle service on the same host, 2, executors only register with shuffle service on the same host where the executors are running at, but not shuffle service running on other hosts. Is this understanding correct? Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416225#comment-17416225 ] Yongjun Zhang edited comment on SPARK-31646 at 9/16/21, 4:35 PM: - Thanks a lot for your quick response [~mauzhang]. Sorry my question was a bit not clear earlier. When you made the comment {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} , you meant to use registeredConnections to mean something different than what you reverted with this jira. Would you please explain 1. what's your interpretation of the reverted version of registeredConnections besides "counting numbers"? is it not useful at all? 2. what your definition is for "new metrics like {{registeredConnections}}."? is it not counting? if it's counting, why the reverted implementation doesn't serve the purpose? BTW, my understanding is that 1, all executors running on a given host would register with the remote shuffle service on the same host, 2, executors only register with shuffle service on the same host where the executors are running at, but not shuffle service running on other hosts. Is this understanding correct? Thanks. was (Author: yzhangal): Thanks a lot for your quick response [~mauzhang]. Sorry my question was a bit not clear earlier. When you made the comment {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} , you meant to use registeredConnections to mean something different than what you reverted with this jira. Would you please explain 1. what's your interpretation of the reverted version? 2. what your definition is for "new metrics like {{registeredConnections}}."? is it not counting? if it's counting, why the reverted implementation doesn't serve the purpose? BTW, my understanding is that 1, all executors running on a given host would register with the remote shuffle service on the same host, 2, executors only register with shuffle service on the same host where the executors are running at, but not shuffle service running on other hosts. Is this understanding correct? Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416225#comment-17416225 ] Yongjun Zhang commented on SPARK-31646: --- Thanks a lot for your quick response [~mauzhang]. Sorry my question was a bit not clear earlier. When you made the comment {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} , you meant to use registeredConnections to mean something different than what you reverted with this jira. Would you please explain 1. what's your interpretation of the reverted version? 2. what your definition is for "new metrics like {{registeredConnections}}."? is it not counting? if it's counting, why the reverted implementation doesn't serve the purpose? BTW, my understanding is that 1, all executors running on a given host would register with the remote shuffle service on the same host, 2, executors only register with shuffle service on the same host where the executors are running at, but not shuffle service running on other hosts. Is this understanding correct? Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415740#comment-17415740 ] Yongjun Zhang edited comment on SPARK-31646 at 9/15/21, 8:27 PM: - HI [~mauzhang], [~dongjoon], [~marek.simunek], Thanks for your work here. SPARK-18364 (Thanks to [~marek.simunek]) intended to create the registeredConnections metrics but reverted by this Jira here with comment: {quote}It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} Could you please elaborate why the original one won't work? How would you all define {{registeredConnections?}} I'm looking into adding a metrics to report the backlogged connections to shuffle service. If feels "registeredConenctions - activeConnections" would be the backlogged connections. Thanks. was (Author: yzhangal): HI [~mauzhang], [~dongjoon], [~marek.simunek], Thanks for your work here. SPARK-18364 (Thanks to [~marek.simunek]) intended to create the registeredConnections metrics but reverted by this Jira here with comment: {quote}It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} Could you please elaborate why the original one won't work? How would you all define {{registeredConnections?}} I'm looking into adding a metrics to report the backlogged connections to shuffle service. If feels "registeredConenctions - activeConnections" would be the backlogged connections. Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415740#comment-17415740 ] Yongjun Zhang edited comment on SPARK-31646 at 9/15/21, 8:27 PM: - HI [~mauzhang], [~dongjoon], [~marek.simunek], Thanks for your work here. SPARK-18364 (Thanks to [~marek.simunek]) intended to create the registeredConnections metrics but reverted by this Jira here with comment: {quote}It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} Could you please elaborate why the original one won't work? How would you all define {{registeredConnections?}} I'm looking into adding a metrics to report the backlogged connections to shuffle service. If feels "registeredConenctions - activeConnections" would be the backlogged connections. Thanks. was (Author: yzhangal): HI [~mauzhang], [~dongjoon], [~marek.simunek], Thanks for your work here. SPARK-18364 (Thanks to [~marek.simunek]) intended to create the registeredConnections metrics but reverted by this Jira here with comment: {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} Could you please elaborate why the original one won't work? How would you all define {{registeredConnections?}} I'm looking into adding a metrics to report the backlogged connections to shuffle service. If feels "registeredConenctions - activeConnections" would be the backlogged connections. Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415740#comment-17415740 ] Yongjun Zhang commented on SPARK-31646: --- HI [~mauzhang], [~dongjoon], [~marek.simunek], Thanks for your work here. SPARK-18364 (Thanks to [~marek.simunek]) intended to create the registeredConnections metrics but reverted by this Jira here with comment: {quote} It's {{registeredConnections}} counter created in {{TransportContext}} that's really counting the numbers and it's misleading for people who want to add new metrics like {{registeredConnections}}. {quote} Could you please elaborate why the original one won't work? How would you all define {{registeredConnections?}} I'm looking into adding a metrics to report the backlogged connections to shuffle service. If feels "registeredConenctions - activeConnections" would be the backlogged connections. Thanks. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org