[jira] [Commented] (FLINK-33127) HeapKeyedStateBackend: use buffered I/O to speed up local recovery
[ https://issues.apache.org/jira/browse/FLINK-33127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768610#comment-17768610 ] Hangxiang Yu commented on FLINK-33127: -- Actually I have taked a second review last month but not received your response until now. Of course, I'm fine that we focused on FLINK-26585 firstly. Just Kindly ping about the duplicated ticket. > HeapKeyedStateBackend: use buffered I/O to speed up local recovery > -- > > Key: FLINK-33127 > URL: https://issues.apache.org/jira/browse/FLINK-33127 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yangyang ZHANG >Assignee: Yangyang ZHANG >Priority: Major > Attachments: thread_dump.png > > > Recently, I observed a slow restore case in local recovery using hashmap > statebackend. > It took 147 seconds to restore from a 467MB snapshot, 9 times slower than > that (16s) when restore from remote fs. > The thread dump show that It read local snapshot file directly by unbuffered > FileInputStream / fs.local.LocalDataInputStream. > !thread_dump.png! > Maybe we can wrap with BufferInputStream to speed up local recovery. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33127) HeapKeyedStateBackend: use buffered I/O to speed up local recovery
[ https://issues.apache.org/jira/browse/FLINK-33127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768602#comment-17768602 ] Matthias Schwalbe commented on FLINK-33127: --- [~masteryhx] : I actually want to finish FLINK-26585 first before I can start FLINK-26586 (capacity) FLINK-26585 is somewhat hung in approval of PR without any progress for a couple of weeks. (will ping you on that ticket in a second) Thias > HeapKeyedStateBackend: use buffered I/O to speed up local recovery > -- > > Key: FLINK-33127 > URL: https://issues.apache.org/jira/browse/FLINK-33127 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yangyang ZHANG >Assignee: Yangyang ZHANG >Priority: Major > Attachments: thread_dump.png > > > Recently, I observed a slow restore case in local recovery using hashmap > statebackend. > It took 147 seconds to restore from a 467MB snapshot, 9 times slower than > that (16s) when restore from remote fs. > The thread dump show that It read local snapshot file directly by unbuffered > FileInputStream / fs.local.LocalDataInputStream. > !thread_dump.png! > Maybe we can wrap with BufferInputStream to speed up local recovery. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33127) HeapKeyedStateBackend: use buffered I/O to speed up local recovery
[ https://issues.apache.org/jira/browse/FLINK-33127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768481#comment-17768481 ] Hangxiang Yu commented on FLINK-33127: -- IIUC, it's duplicated one with FLINK-26586 and FLINK-19911. So just kindly ping, [~Matthias Schwalbe] Are you still working on FLINK-26586 ? > HeapKeyedStateBackend: use buffered I/O to speed up local recovery > -- > > Key: FLINK-33127 > URL: https://issues.apache.org/jira/browse/FLINK-33127 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Yangyang ZHANG >Assignee: Yangyang ZHANG >Priority: Major > Attachments: thread_dump.png > > > Recently, I observed a slow restore case in local recovery using hashmap > statebackend. > It took 147 seconds to restore from a 467MB snapshot, 9 times slower than > that (16s) when restore from remote fs. > The thread dump show that It read local snapshot file directly by unbuffered > FileInputStream / fs.local.LocalDataInputStream. > !thread_dump.png! > Maybe we can wrap with BufferInputStream to speed up local recovery. -- This message was sent by Atlassian Jira (v8.20.10#820010)