Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Thank you Kostas for spending time on my case. Relating to the issue I mentioned, I have another issue caused by having a lot of files to list. From the error msg, I understand that the listing was taking more than 30s, and the JM thought that it hung and killed it. Is that possible to increase

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Kostas Kloudas
I see, Thanks for the clarification. Cheers, Kostas > On Sep 25, 2018, at 8:51 AM, Averell wrote: > > Hi Kostas, > > I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When > I said "Within that 15 minutes, checkpointing process is not triggered > though" in my previous

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Hi Kostas, I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When I said "Within that 15 minutes, checkpointing process is not triggered though" in my previous email, I was not complaining that checkpoint is not running, but to say that the slowness is not due to ongoing

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Hi Kostas, Yes, applying the filter on the 100K files takes time, and the delay of 15 minutes I observed definitely caused by that big number of files and the cost of each individual file status check. However, the delay is much smaller when checkpointing is off. Within that 15 minutes,

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Kostas Kloudas
Hi Averell, Happy to hear that the problem is no longer there and if you have more news from your debugging, let us know. The thing that I wanted to mention is that from what you are describing, the problem does not seem to be related to checkpointing, but to the fact that applying your

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-23 Thread Averell
Hi Vino, and all, I tried to avoid the step to get File Status, and found that the problem is not there any more. I guess doing that with every single file out of 100K+ files on S3 caused some issue with checkpointing. Still trying to find the cause, but with lower priority now. Thanks for your

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Please refer to this version: === import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new java.text.SimpleDateFormat("MMdd

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Hi Vino, I am using a custom FileInputFormat, but the mentioned problem only comes when I try a custom FilePathFilter. My whole file for that custom FilePathFilter is quoted below. Regarding enabling DEBUG, which classes/packages should I turn DEBUG on? as I am afraid that turning DEBUG on at

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread vino yang
Hi Averell, Is this all the custom code for "CustomFileSource"? If not, can you share the entire file with us, and if you can set the log level to DEBUG, it will help you analyze and locate the problem. If you can't come to a conclusion, you can share the log with us. Thanks, vino. Averell

Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Good day everyone, I have about 100 thousand files to read, and a custom FilePathFilter with a simple filterPath method defined as below (the custom part is only to check file-size and skip files with size = 0) override def filterPath(filePath: Path): Boolean = { filePath