[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335952#comment-17335952 ] Flink Jira Bot commented on FLINK-20918: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.11.3, 1.12.0 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available, stale-major > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327411#comment-17327411 ] Flink Jira Bot commented on FLINK-20918: This major issue is unassigned and itself and all of its Sub-Tasks have not been updated for 30 days. So, it has been labeled "stale-major". If this ticket is indeed "major", please either assign yourself or give an update. Afterwards, please remove the label. In 7 days the issue will be deprioritized. > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.11.3, 1.12.0 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available, stale-major > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278767#comment-17278767 ] Aljoscha Krettek commented on FLINK-20918: -- Is there an observed real-world impact of this? I would be cautious with just removing the call since some File Systems might have unexpected implementations. After all, the interface does have the two methods. > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.11.3, 1.12.0 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275822#comment-17275822 ] Paul Lin commented on FLINK-20918: -- Ping [~sewen] [~lzljs3620320] . > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.11.3, 1.12.0 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268534#comment-17268534 ] Paul Lin commented on FLINK-20918: -- Hi [~gaoyunhaii], thanks for your comments. IIUC from the original design of the `Synable` interface (see HADOOP-6313 ), `hsync` is basically `hflush` plus posix `fsync`. Thus, if implemented correctly, the implementations should not require calling `hflush` before `hsync`. I've verified that this is true for many common FileSystems and Object Stores, including HDFS, S3, WebHDFS, local FS, etc. What do you think? > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.12.0, 1.11.3 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268505#comment-17268505 ] Yun Gao commented on FLINK-20918: - Hi [~Paul Lin], very thanks for opening the issue! One concern to me is that could we ensure that in all implementations we have `hsync` is an enhanced version of `hflush` ? I'm ask so since I think there might be some other FileSystem or Object Store provide hadoop compatible FileSystems, thus is it possible that the change might cause different behaviors for some users ? > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, FileSystems >Affects Versions: 1.12.0, 1.11.3 >Reporter: Paul Lin >Priority: Major > Labels: pull-request-available > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream
[ https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262369#comment-17262369 ] Paul Lin commented on FLINK-20918: -- I'd like to take this issue if we agree. > Avoid excessive flush of Hadoop output stream > - > > Key: FLINK-20918 > URL: https://issues.apache.org/jira/browse/FLINK-20918 > Project: Flink > Issue Type: Bug >Affects Versions: 1.12.0, 1.11.3 >Reporter: Paul Lin >Priority: Major > > [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123] > calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of > `hflush`. We should remove the `hflush` call to avoid the excessive flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)