[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927402#comment-16927402
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 8:41 AM:
--

[~kabhwan]
* How long "spark.history.fs.update.interval" has been set?20s
* How many applications are reloaded per each call of checkForLogs?   5+
* How big the event log for each application is?there maybe many large logs.

I think SPARK-28594 is more helpful for our case.


was (Author: hzfeiwang):
* How long "spark.history.fs.update.interval" has been set?20s
* How many applications are reloaded per each call of checkForLogs?   5+
* How big the event log for each application is?there maybe many large logs.

I think SPARK-28594 is more helpful for our case.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927305#comment-16927305
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 7:31 AM:
--

[~kabhwan] Other replay threads are waiting for the straggler replay thread.
 !image-2019-09-11-15-10-25-326.png! 
 !image-2019-09-11-15-09-22-912.png! 
I think we can set a status for replaying log into application listing, such as 
Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.


was (Author: hzfeiwang):
[~kabhwan] Other replay threads are waiting for the straggler replay thread.
 !image-2019-09-11-15-10-25-326.png! 
 !image-2019-09-11-15-09-22-912.png! 
I think we could set a status for replaying log into application listing, such 
as Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927305#comment-16927305
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 7:15 AM:
--

[~kabhwan] Other replay threads are waiting for the straggler replay thread.
 !image-2019-09-11-15-10-25-326.png! 
 !image-2019-09-11-15-09-22-912.png! 
I think we could set a status for replaying log into application listing, such 
as Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.


was (Author: hzfeiwang):
[~kabhwan] Other replay threads are waiting for the straggler replay thread.
 !image-2019-09-11-15-10-25-326.png! 
 !image-2019-09-11-15-09-22-912.png! 
I think we could set a status for replaying log to application listing, such as 
Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927305#comment-16927305
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 7:10 AM:
--

[~kabhwan] Other replay threads are waiting for the straggler replay thread.
 !image-2019-09-11-15-10-25-326.png! 
 !image-2019-09-11-15-09-22-912.png! 
I think we could set a status for replaying log to application listing, such as 
Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.


was (Author: hzfeiwang):
[~kabhwan] yes, Other replay threads are waiting for the straggler replay 
thread.
I think we could set a status for replaying log to application listing, such as 
Processing and Completed.

When we checking logs to replay, we could filter the logs which are processing 
to prevent been replayed repeatedly.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-10 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927198#comment-16927198
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 2:26 AM:
--

I think it is better to replay logs asynchronously.


was (Author: hzfeiwang):
I think we can change it  to Asynchronous.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org