[jira] [Commented] (SPARK-43100) Mismatch of field name in log event writer and parser for push shuffle metrics

2023-04-11 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17711148#comment-17711148 ] Ye Zhou commented on SPARK-43100: - Created PR to fix the issue

[jira] [Created] (SPARK-43100) Mismatch of field name in log event writer and parser for push shuffle metrics

2023-04-11 Thread Ye Zhou (Jira)
Ye Zhou created SPARK-43100: --- Summary: Mismatch of field name in log event writer and parser for push shuffle metrics Key: SPARK-43100 URL: https://issues.apache.org/jira/browse/SPARK-43100 Project: Spark

[jira] [Created] (SPARK-38987) Handle fallback when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-04-21 Thread Ye Zhou (Jira)
Ye Zhou created SPARK-38987: --- Summary: Handle fallback when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true Key: SPARK-38987 URL: https://issues.apache.org/jira/browse/SPARK-38987

[jira] (SPARK-33236) Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-03-21 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33236 ] Ye Zhou deleted comment on SPARK-33236: - was (Author: zhouyejoe): WIP PR posted [https://github.com/apache/spark/pull/35906.]  > Enable Push-based shuffle service to store state in NM level DB

[jira] [Commented] (SPARK-33236) Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-03-17 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508498#comment-17508498 ] Ye Zhou commented on SPARK-33236: - WIP PR posted [https://github.com/apache/spark/pull/35906.]  >

[jira] [Created] (SPARK-37023) Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry

2021-10-16 Thread Ye Zhou (Jira)
Ye Zhou created SPARK-37023: --- Summary: Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry Key: SPARK-37023 URL: https://issues.apache.org/jira/browse/SPARK-37023

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-30 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422558#comment-17422558 ] Ye Zhou commented on SPARK-36892: - Raised PR [https://github.com/apache/spark/pull/34156.] UT to be

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-29 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422425#comment-17422425 ] Ye Zhou commented on SPARK-36892: - I am working on this issue. We have a job which can reproduce this

[jira] [Commented] (SPARK-36772) FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-15 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415879#comment-17415879 ] Ye Zhou commented on SPARK-36772: - I will work on this one and post PR ASAP. > FinalizeShuffleMerge

[jira] [Updated] (SPARK-36744) Support IO encryption for push-based shuffle

2021-09-13 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-36744: Description: As a follow up to SPARK-36705, push-based shuffle is not compatible with IO encryption. We

[jira] [Updated] (SPARK-33573) Server and client side metrics related to push-based shuffle

2021-09-09 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-33573: Description: Shuffle Server side metrics for push based shuffle. (was: Need to add metrics on both

[jira] [Updated] (SPARK-33573) Server side metrics related to push-based shuffle

2021-09-09 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-33573: Summary: Server side metrics related to push-based shuffle (was: Server and client side metrics related

[jira] [Commented] (SPARK-30602) SPIP: Support push-based shuffle to improve shuffle efficiency

2021-07-29 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390184#comment-17390184 ] Ye Zhou commented on SPARK-30602: - [~Gengliang.Wang] This is the last PR(Subtask 9) to be merged 

[jira] [Resolved] (SPARK-35546) Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-07-20 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou resolved SPARK-35546. - Fix Version/s: 3.2.0 Target Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull

[jira] [Updated] (SPARK-35546) Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-06-02 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-35546: Parent: SPARK-30602 Issue Type: Sub-task (was: Bug) > Enable push-based shuffle when multiple

[jira] [Updated] (SPARK-35546) Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-06-02 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-35546: Parent: (was: SPARK-33235) Issue Type: Bug (was: Sub-task) > Enable push-based shuffle when

[jira] [Updated] (SPARK-35546) Properly handle race conditions in RemoteBlockPushResolver to support push based shuffle with multiple app attempts enabled

2021-06-02 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-35546: Summary: Properly handle race conditions in RemoteBlockPushResolver to support push based shuffle with

[jira] [Updated] (SPARK-35546) Properly handle race conditions in RemoteBlockPushResolver for access to the internal ConcurrentHashMaps to handle multiple app attempts

2021-06-02 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-35546: Summary: Properly handle race conditions in RemoteBlockPushResolver for access to the internal

[jira] [Updated] (SPARK-35546) Properly handle race conditions in RemoteBlockPushResolver for access to the internal ConcurrentHashMaps with multiple app attempts enabled

2021-06-02 Thread Ye Zhou (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-35546: Summary: Properly handle race conditions in RemoteBlockPushResolver for access to the internal

[jira] [Created] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-05-27 Thread Ye Zhou (Jira)
Ye Zhou created SPARK-35548: --- Summary: Handling new attempt has started error message in BlockPushErrorHandler in client Key: SPARK-35548 URL: https://issues.apache.org/jira/browse/SPARK-35548 Project:

[jira] [Created] (SPARK-35546) Handling race condition and memory leak in RemoteBlockPushResolver

2021-05-27 Thread Ye Zhou (Jira)
Ye Zhou created SPARK-35546: --- Summary: Handling race condition and memory leak in RemoteBlockPushResolver Key: SPARK-35546 URL: https://issues.apache.org/jira/browse/SPARK-35546 Project: Spark

[jira] [Commented] (SPARK-25634) New Metrics in External Shuffle Service to help identify abusing application

2018-10-03 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637436#comment-16637436 ] Ye Zhou commented on SPARK-25634: - [~felixcheung]  [~vanzin]  [~tgraves]  [~irashid]  [~zsxwing] More

[jira] [Created] (SPARK-25634) New Metrics in External Shuffle Service to help identify abusing application

2018-10-03 Thread Ye Zhou (JIRA)
Ye Zhou created SPARK-25634: --- Summary: New Metrics in External Shuffle Service to help identify abusing application Key: SPARK-25634 URL: https://issues.apache.org/jira/browse/SPARK-25634 Project: Spark

[jira] [Resolved] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2018-08-02 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou resolved SPARK-21961. - Resolution: Won't Fix > Filter out BlockStatuses Accumulators during replaying history logs in Spark >

[jira] [Commented] (SPARK-23607) Use HDFS extended attributes to store application summary to improve the Spark History Server performance

2018-03-12 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396193#comment-16396193 ] Ye Zhou commented on SPARK-23607: - [~vanzin] Cool. I will post a PR soon. Thanks. > Use HDFS extended

[jira] [Commented] (SPARK-23608) SHS needs synchronization between attachSparkUI and detachSparkUI functions

2018-03-05 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387175#comment-16387175 ] Ye Zhou commented on SPARK-23608: - Pull Request: https://github.com/apache/spark/pull/20744 > SHS needs

[jira] [Commented] (SPARK-23608) SHS needs synchronization between attachSparkUI and detachSparkUI functions

2018-03-05 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387162#comment-16387162 ] Ye Zhou commented on SPARK-23608: - I will post a pull request for this minor change. [~vanzin]

[jira] [Created] (SPARK-23608) SHS needs synchronization between attachSparkUI and detachSparkUI functions

2018-03-05 Thread Ye Zhou (JIRA)
Ye Zhou created SPARK-23608: --- Summary: SHS needs synchronization between attachSparkUI and detachSparkUI functions Key: SPARK-23608 URL: https://issues.apache.org/jira/browse/SPARK-23608 Project: Spark

[jira] [Comment Edited] (SPARK-23607) Use HDFS extended attributes to store application summary to improve the Spark History Server performance

2018-03-05 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387126#comment-16387126 ] Ye Zhou edited comment on SPARK-23607 at 3/6/18 1:42 AM: - [~vanzin]   [~zsxwing]

[jira] [Updated] (SPARK-23607) Use HDFS extended attributes to store application summary to improve the Spark History Server performance

2018-03-05 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-23607: Shepherd: (was: Marcelo Vanzin) > Use HDFS extended attributes to store application summary to improve

[jira] [Commented] (SPARK-23607) Use HDFS extended attributes to store application summary to improve the Spark History Server performance

2018-03-05 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387126#comment-16387126 ] Ye Zhou commented on SPARK-23607: - [~vanzin] Any comments? Thanks. > Use HDFS extended attributes to

[jira] [Created] (SPARK-23607) Use HDFS extended attributes to store application summary to improve the Spark History Server performance

2018-03-05 Thread Ye Zhou (JIRA)
Ye Zhou created SPARK-23607: --- Summary: Use HDFS extended attributes to store application summary to improve the Spark History Server performance Key: SPARK-23607 URL: https://issues.apache.org/jira/browse/SPARK-23607

[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-01-24 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338378#comment-16338378 ] Ye Zhou commented on SPARK-23206: - [~zsxwing] Hi, Can you help find some one who can help review this

[jira] [Commented] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-19 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172162#comment-16172162 ] Ye Zhou commented on SPARK-21961: - [~zsxwing] Can you help to take a look? Thanks. > Filter out

[jira] [Commented] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159514#comment-16159514 ] Ye Zhou commented on SPARK-21961: - Pull Request Added: https://github.com/apache/spark/pull/19170 >

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Description: As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of memory in

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Description: As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of memory in

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Description: As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of memory in

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Description: As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of memory in

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Description: As described in SPARK-20923, TaskMetrics._updatedBlockStatuses uses a lot of memory in

[jira] [Updated] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21961: Attachment: Objects_Count_in_Heap.png One_Thread_Took_24GB.png > Filter out BlockStatuses

[jira] [Created] (SPARK-21961) Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server

2017-09-08 Thread Ye Zhou (JIRA)
Ye Zhou created SPARK-21961: --- Summary: Filter out BlockStatuses Accumulators during replaying history logs in Spark History Server Key: SPARK-21961 URL: https://issues.apache.org/jira/browse/SPARK-21961

[jira] [Commented] (SPARK-21715) History Server should not respond history page html content multiple times for only one http request

2017-08-14 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126398#comment-16126398 ] Ye Zhou commented on SPARK-21715: - Pull Request: https://github.com/apache/spark/pull/18941 > History

[jira] [Updated] (SPARK-21715) History Server should not respond history page html content multiple times for only one http request

2017-08-14 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Summary: History Server should not respond history page html content multiple times for only one http

[jira] [Updated] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Attachment: ResponseContent.png > History Server respondes history page html content multiple times for

[jira] [Updated] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Description: UI looks fine for the home page. But we check the performance for each individual

[jira] [Updated] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Description: UI looks fine for the home page. But we check the performance for each individual

[jira] [Updated] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Description: UI looks fine for the home page. But we check the performance for each individual

[jira] [Updated] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ye Zhou updated SPARK-21715: Attachment: Performance.png > History Server respondes history page html content multiple times for only

[jira] [Created] (SPARK-21715) History Server respondes history page html content multiple times for only one http request

2017-08-11 Thread Ye Zhou (JIRA)
Ye Zhou created SPARK-21715: --- Summary: History Server respondes history page html content multiple times for only one http request Key: SPARK-21715 URL: https://issues.apache.org/jira/browse/SPARK-21715

[jira] [Commented] (SPARK-18085) SPIP: Better History Server scalability for many / large applications

2017-07-12 Thread Ye Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084451#comment-16084451 ] Ye Zhou commented on SPARK-18085: - I want to add my own testing experience with the codes from the HEAD