[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265812#comment-17265812 ] dzcxzl commented on SPARK-33790: ok, I opened a JIRA [SPARK-34125 |https://issues.apache.org/jira/browse/SPARK-34125] > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265731#comment-17265731 ] Jungtaek Lim commented on SPARK-33790: -- Oh OK I haven't encountered the issue but Scala mutable HashMap looks to have the issue... Would you mind filing separate JIRA issue and raise a PR for branch-2.4? 2.4.x is still a supported version, so the PR would be reviewed and accepted even that's not applied for 3.x. > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265724#comment-17265724 ] dzcxzl commented on SPARK-33790: Thread stack when not working !http://git.dev.sh.ctripcorp.com/framework-di/spark-2.2.0/uploads/9cfa9662f563ac64f77f4d4ee6fd9243/image.png! [https://github.com/scala/bug/issues/10436] > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265700#comment-17265700 ] Jungtaek Lim commented on SPARK-33790: -- {quote} The following is my case 2.x version EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread-safe and may hang. {quote} Could you please elaborate the situation of possible hang? > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265695#comment-17265695 ] Apache Spark commented on SPARK-33790: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31187 > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265693#comment-17265693 ] Apache Spark commented on SPARK-33790: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31187 > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265691#comment-17265691 ] dzcxzl commented on SPARK-33790: This is indeed a performance regression problem. The following is my case 2.x version EventLoggingListener.codecMap is of type mutable.HashMap, which is not thread-safe and may hang. 3.x version changed to EventLogFileReader.codecMap changed to ConcurrentHashMap type. In the 2.x version, the history server may not work. I tried to use the 3.x version, and found that a round of scan has slowed down a lot, 7min rose to about 23min. In addition, do I need to fix the thread safety issues in version 2.x? [~kabhwan] > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265679#comment-17265679 ] Apache Spark commented on SPARK-33790: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31186 > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265662#comment-17265662 ] Jungtaek Lim commented on SPARK-33790: -- I've revisited this somehow and I realized this is regression on performance for event log v1. (SPARK-28869 caused the regression.) I'll submit PRs for below branches. This should be fixed in 3.1.x / 3.0.x as well. > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Critical > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250779#comment-17250779 ] Apache Spark commented on SPARK-33790: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30814 > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > Fix For: 3.2.0 > > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls and improve the speed of the history > server. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33790) Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader
[ https://issues.apache.org/jira/browse/SPARK-33790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249650#comment-17249650 ] Apache Spark commented on SPARK-33790: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/30780 > Reduce the rpc call of getFileStatus in SingleFileEventLogFileReader > > > Key: SPARK-33790 > URL: https://issues.apache.org/jira/browse/SPARK-33790 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: dzcxzl >Priority: Trivial > > FsHistoryProvider#checkForLogs already has FileStatus when constructing > SingleFileEventLogFileReader, and there is no need to get the FileStatus > again when SingleFileEventLogFileReader#fileSizeForLastIndex. > This can reduce a lot of rpc calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org