[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335952#comment-17335952
 ] 

Flink Jira Bot commented on FLINK-20918:


This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327411#comment-17327411
 ] 

Flink Jira Bot commented on FLINK-20918:


This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available, stale-major
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-02-04 Thread Aljoscha Krettek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278767#comment-17278767
 ] 

Aljoscha Krettek commented on FLINK-20918:
--

Is there an observed real-world impact of this? I would be cautious with just 
removing the call since some File Systems might have unexpected 
implementations. After all, the interface does have the two methods.

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-01-31 Thread Paul Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275822#comment-17275822
 ] 

Paul Lin commented on FLINK-20918:
--

Ping [~sewen] [~lzljs3620320] .

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.11.3, 1.12.0
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-01-20 Thread Paul Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268534#comment-17268534
 ] 

Paul Lin commented on FLINK-20918:
--

Hi [~gaoyunhaii], thanks for your comments. IIUC from the original design of 
the `Synable` interface (see HADOOP-6313 ), `hsync` is basically `hflush` plus 
posix `fsync`. Thus, if implemented correctly, the implementations should not 
require calling `hflush` before `hsync`.  I've verified that this is true for 
many common FileSystems and Object Stores, including HDFS, S3, WebHDFS, local 
FS, etc. What do you think?

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-01-20 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268505#comment-17268505
 ] 

Yun Gao commented on FLINK-20918:
-

Hi [~Paul Lin], very thanks for opening the issue! One concern to me is that 
could we ensure that in all implementations we have `hsync` is an enhanced 
version of `hflush` ? I'm ask so since I think there might be some other 
FileSystem or Object Store provide hadoop compatible FileSystems, thus is it 
possible that the change might cause different behaviors for some users ?

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility, FileSystems
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Paul Lin
>Priority: Major
>  Labels: pull-request-available
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20918) Avoid excessive flush of Hadoop output stream

2021-01-10 Thread Paul Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262369#comment-17262369
 ] 

Paul Lin commented on FLINK-20918:
--

I'd like to take this issue if we agree.

> Avoid excessive flush of Hadoop output stream
> -
>
> Key: FLINK-20918
> URL: https://issues.apache.org/jira/browse/FLINK-20918
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.12.0, 1.11.3
>Reporter: Paul Lin
>Priority: Major
>
> [HadoopRecoverableFsDataOutputStream#sync|https://github.com/apache/flink/blob/67d167ccd45046fc5ed222ac1f1e3ba5e6ec434b/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L123]
>  calls both `hflush` and `hsync`, whereas `hsync` is an enhanced version of 
> `hflush`. We should remove the `hflush` call to avoid the excessive flush.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)