[jira] [Created] (FLINK-10382) Writer has already been opened while using AvroKeyValueSinkWriter and BucketingSink

2018-09-20 Thread Chengzhi Zhao (JIRA)
Chengzhi Zhao created FLINK-10382:
-

 Summary: Writer has already been opened while using 
AvroKeyValueSinkWriter and BucketingSink
 Key: FLINK-10382
 URL: https://issues.apache.org/jira/browse/FLINK-10382
 Project: Flink
  Issue Type: Bug
  Components: Streaming Connectors
Affects Versions: 1.6.0, 1.5.0
Reporter: Chengzhi Zhao


I am using flink 1.6.0 and I am using AvroKeyValueSinkWriter and BucketingSink 
to S3.
 
After the application running for a while ~ 20 mins, I got an *exception: 
java.lang.IllegalStateException: Writer has already been opened*
{code:java}
2018-09-17 15:40:23,771 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
checkpoint 4 @ 1537198823640 for job 8f9ab122fb7452714465eb1e1989e4d7.
2018-09-17 15:41:27,805 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (2/16) 
(25914cb3f77c8e4271b0fb6ea597ed50) switched from RUNNING to FAILED.
java.lang.IllegalStateException: Writer has already been opened
at 
org.apache.flink.streaming.connectors.fs.StreamWriterBase.open(StreamWriterBase.java:68)
at 
org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter.open(AvroKeyValueSinkWriter.java:150)
at 
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.openNewPartFile(BucketingSink.java:583)
at 
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.invoke(BucketingSink.java:458)
at 
org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
2018-09-17 15:41:27,808 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Stream to Stream 
Join (8f9ab122fb7452714465eb1e1989e4d7) switched from state RUNNING to FAILING.
java.lang.IllegalStateException: Writer has already been opened
at 
org.apache.flink.streaming.connectors.fs.StreamWriterBase.open(StreamWriterBase.java:68)
at 
org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter.open(AvroKeyValueSinkWriter.java:150)
at 
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.openNewPartFile(BucketingSink.java:583)
at 
org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.invoke(BucketingSink.java:458)
at 
org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
{code}
After checking the code, I think the issue might be related to 
AvroKeyValueSinkWriter.java and led to the writer has not been closed 
completely. I also noticed this change and affect 1.5+ 
[https://github.com/apache/flink/commit/915213c7afaf3f9d04c240f43d88710280d844e3#diff-86c35c993fdb0c482544951b376e5ea6]
I created my own AvroKeyValueSinkWriter class and implement the code similar as 
v1.4, it seems running fine now. 
{code:java}
@Override
public void close() throws IOException {
try {
super.close();
} finally {
if (keyValueWriter != null) {
keyValueWriter.close();
}
}
}

{code}
I am curious if anyone had the similar issue, Appreciate anyone has insights on 
it. Thanks! 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8599) Improve the failure behavior of the FileInputFormat for bad files

2018-02-18 Thread Chengzhi Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengzhi Zhao updated FLINK-8599:
-
Summary: Improve the failure behavior of the FileInputFormat for bad files  
(was: Improve the failure behavior of the ContinuousFileReaderOperator for bad 
files)

> Improve the failure behavior of the FileInputFormat for bad files
> -
>
> Key: FLINK-8599
> URL: https://issues.apache.org/jira/browse/FLINK-8599
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Chengzhi Zhao
>Priority: Major
>
> So we have a s3 path that flink is monitoring that path to see new files 
> available.
> {code:java}
> val avroInputStream_activity = env.readFile(format, path, 
> FileProcessingMode.PROCESS_CONTINUOUSLY, 1)  {code}
>  
> I am doing both internal and external check pointing and let's say there is a 
> bad file (for example, a different schema been dropped in this folder) came 
> to the path and flink will do several retries. I want to take those bad files 
> and let the process continue. However, since the file path persist in the 
> checkpoint, when I try to resume from external checkpoint, it threw the 
> following error on no file been found.
>  
> {code:java}
> java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No 
> such file or directory: s3a://myfile{code}
>  
> As [~fhue...@gmail.com] suggested, we could check if a path exists and before 
> trying to read a file and ignore the input split instead of throwing an 
> exception and causing a failure.
>  
> Also, I am thinking about add an error output for bad files as an option to 
> users. So if there is any bad files exist we could move them in a separated 
> path and do further analysis. 
>  
> Not sure how people feel about it, but I'd like to contribute on it if people 
> think this can be an improvement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8599) Improve the failure behavior of the ContinuousFileReaderOperator for bad files

2018-02-07 Thread Chengzhi Zhao (JIRA)
Chengzhi Zhao created FLINK-8599:


 Summary: Improve the failure behavior of the 
ContinuousFileReaderOperator for bad files
 Key: FLINK-8599
 URL: https://issues.apache.org/jira/browse/FLINK-8599
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API
Affects Versions: 1.3.2
Reporter: Chengzhi Zhao


So we have a s3 path that flink is monitoring that path to see new files 
available.
{code:java}
val avroInputStream_activity = env.readFile(format, path, 
FileProcessingMode.PROCESS_CONTINUOUSLY, 1)  {code}
 
I am doing both internal and external check pointing and let's say there is a 
bad file (for example, a different schema been dropped in this folder) came to 
the path and flink will do several retries. I want to take those bad files and 
let the process continue. However, since the file path persist in the 
checkpoint, when I try to resume from external checkpoint, it threw the 
following error on no file been found.
 
{code:java}
java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No 
such file or directory: s3a://myfile{code}
 
As [~fhue...@gmail.com] suggested, we could check if a path exists and before 
trying to read a file and ignore the input split instead of throwing an 
exception and causing a failure.
 
Also, I am thinking about add an error output for bad files as an option to 
users. So if there is any bad files exist we could move them in a separated 
path and do further analysis. 
 
Not sure how people feel about it, but I'd like to contribute on it if people 
think this can be an improvement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8599) Improve the failure behavior of the ContinuousFileReaderOperator for bad files

2018-02-07 Thread Chengzhi Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengzhi Zhao updated FLINK-8599:
-
Affects Version/s: 1.4.0

> Improve the failure behavior of the ContinuousFileReaderOperator for bad files
> --
>
> Key: FLINK-8599
> URL: https://issues.apache.org/jira/browse/FLINK-8599
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Chengzhi Zhao
>Priority: Major
>
> So we have a s3 path that flink is monitoring that path to see new files 
> available.
> {code:java}
> val avroInputStream_activity = env.readFile(format, path, 
> FileProcessingMode.PROCESS_CONTINUOUSLY, 1)  {code}
>  
> I am doing both internal and external check pointing and let's say there is a 
> bad file (for example, a different schema been dropped in this folder) came 
> to the path and flink will do several retries. I want to take those bad files 
> and let the process continue. However, since the file path persist in the 
> checkpoint, when I try to resume from external checkpoint, it threw the 
> following error on no file been found.
>  
> {code:java}
> java.io.IOException: Error opening the Input Split s3a://myfile [0,904]: No 
> such file or directory: s3a://myfile{code}
>  
> As [~fhue...@gmail.com] suggested, we could check if a path exists and before 
> trying to read a file and ignore the input split instead of throwing an 
> exception and causing a failure.
>  
> Also, I am thinking about add an error output for bad files as an option to 
> users. So if there is any bad files exist we could move them in a separated 
> path and do further analysis. 
>  
> Not sure how people feel about it, but I'd like to contribute on it if people 
> think this can be an improvement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)