[jira] [Commented] (FLINK-13850) Refactor part file configuration into a single method

2019-10-19 Thread lichong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955200#comment-16955200
 ] 

lichong commented on FLINK-13850:
-

This is what I really need recently. I want to name the target file and the 
in-porgress file instead of the default way, such as a new target name without 
any subtask info, and remove the dot prefix of the in-progress file, etc. 

OutputFileConfig maybe make more sense for me as it means this is the config 
for the output file, and also there should be a way for users who can provide 
the configuration or just use the default value. 

My opinion, thanks.

I am expecting this feature.

> Refactor part file configuration into a single method
> -
>
> Key: FLINK-13850
> URL: https://issues.apache.org/jira/browse/FLINK-13850
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: Gyula Fora
>Assignee: João Boto
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is only two methods on both format builders
> withPartFilePrefix and withPartFileSuffix for configuring the part files but 
> in the future it is likely to grow.
>  * More settings, different directories for pending / inprogress files etc
> I suggest we remove these two methods and replace them with a single : 
> withPartFileConfig(..) where we use an extensible config class.
> This should be fixed before 1.10 in order to not release the other methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-12574) using sink StreamingFileSink files are overwritten when resuming application causing data loss

2019-10-17 Thread lichong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954207#comment-16954207
 ] 

lichong commented on FLINK-12574:
-

[~yitz589] I think it's all right for the flink design about this problem, but 
I want to know how do you resolve this when you resume your application or 
reprocess old data from MQ such as kafka. In this case we need to consume the 
data in kafka from a give offset instead of the offset in the checkpoint or 
savepoint.

Any reply will be appreciated. 

Thx.

> using sink StreamingFileSink files are overwritten when resuming application 
> causing data loss
> --
>
> Key: FLINK-12574
> URL: https://issues.apache.org/jira/browse/FLINK-12574
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.8.0
>Reporter: yitzchak lieberman
>Priority: Critical
>
> when part files are saved to s3 bucket (with bucket assigner) with simple 
> names such as:
> part-0-0 and part-1-2
> restarting or resuming application causes checkpoint id to start from 0 and 
> old files will be replaced by new part files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)