[jira] [Updated] (FLINK-9113) Data loss in BucketingSink when writing to local filesystem
[ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-9113: Fix Version/s: 1.4.3 > Data loss in BucketingSink when writing to local filesystem > --- > > Key: FLINK-9113 > URL: https://issues.apache.org/jira/browse/FLINK-9113 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0, 1.4.3 > > > For local filesystems, it is not guaranteed that the data is flushed to disk > during checkpointing. This leads to data loss in cases of TaskManager > failures when writing to a local filesystem > {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a > written length but the data is not written into the file (thus the valid > length might be greater than the actual file size). {{hsync}} and {{hflush}} > have no effect either. > It seems that this behavior won't be fixed in the near future: > https://issues.apache.org/jira/browse/HADOOP-7844 > One solution would be to call {{close()}} on a checkpoint for local > filesystems, even though this would lead to performance decrease. If we don't > fix this issue, we should at least add proper documentation for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9113) Data loss in BucketingSink when writing to local filesystem
[ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-9113: Fix Version/s: 1.5.0 > Data loss in BucketingSink when writing to local filesystem > --- > > Key: FLINK-9113 > URL: https://issues.apache.org/jira/browse/FLINK-9113 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > For local filesystems, it is not guaranteed that the data is flushed to disk > during checkpointing. This leads to data loss in cases of TaskManager > failures when writing to a local filesystem > {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a > written length but the data is not written into the file (thus the valid > length might be greater than the actual file size). {{hsync}} and {{hflush}} > have no effect either. > It seems that this behavior won't be fixed in the near future: > https://issues.apache.org/jira/browse/HADOOP-7844 > One solution would be to call {{close()}} on a checkpoint for local > filesystems, even though this would lead to performance decrease. If we don't > fix this issue, we should at least add proper documentation for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9113) Data loss in BucketingSink when writing to local filesystem
[ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-9113: Priority: Blocker (was: Major) > Data loss in BucketingSink when writing to local filesystem > --- > > Key: FLINK-9113 > URL: https://issues.apache.org/jira/browse/FLINK-9113 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > For local filesystems, it is not guaranteed that the data is flushed to disk > during checkpointing. This leads to data loss in cases of TaskManager > failures when writing to a local filesystem > {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a > written length but the data is not written into the file (thus the valid > length might be greater than the actual file size). {{hsync}} and {{hflush}} > have no effect either. > It seems that this behavior won't be fixed in the near future: > https://issues.apache.org/jira/browse/HADOOP-7844 > One solution would be to call {{close()}} on a checkpoint for local > filesystems, even though this would lead to performance decrease. If we don't > fix this issue, we should at least add proper documentation for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9113) Data loss in BucketingSink when writing to local filesystem
[ https://issues.apache.org/jira/browse/FLINK-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-9113: Description: For local filesystems, it is not guaranteed that the data is flushed to disk during checkpointing. This leads to data loss in cases of TaskManager failures when writing to a local filesystem {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a written length but the data is not written into the file (thus the valid length might be greater than the actual file size). {{hsync}} and {{hflush}} have no effect either. It seems that this behavior won't be fixed in the near future: https://issues.apache.org/jira/browse/HADOOP-7844 One solution would be to call {{close()}} on a checkpoint for local filesystems, even though this would lead to performance decrease. If we don't fix this issue, we should at least add proper documentation for it. was:This issue is closely related to FLINK-7737. By default the bucketing sink uses HDFS's {{org.apache.hadoop.fs.FSDataOutputStream#hflush}} for performance reasons. However, this leads to data loss in case of TaskManager failures when writing to a local filesystem {{org.apache.hadoop.fs.LocalFileSystem}}. We should use {{hsync}} by default in local filesystem cases and make it possible to disable this behavior if needed. > Data loss in BucketingSink when writing to local filesystem > --- > > Key: FLINK-9113 > URL: https://issues.apache.org/jira/browse/FLINK-9113 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > > For local filesystems, it is not guaranteed that the data is flushed to disk > during checkpointing. This leads to data loss in cases of TaskManager > failures when writing to a local filesystem > {{org.apache.hadoop.fs.LocalFileSystem}}. The {{flush()}} method returns a > written length but the data is not written into the file (thus the valid > length might be greater than the actual file size). {{hsync}} and {{hflush}} > have no effect either. > It seems that this behavior won't be fixed in the near future: > https://issues.apache.org/jira/browse/HADOOP-7844 > One solution would be to call {{close()}} on a checkpoint for local > filesystems, even though this would lead to performance decrease. If we don't > fix this issue, we should at least add proper documentation for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)