Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Please refer to this version: === import java.util.Date import org.apache.flink.api.common.io.FilePathFilter import org.apache.flink.core.fs.Path import org.slf4j.LoggerFactory object SdcFilePathFilter { private val TIME_FORMAT = new java.text.SimpleDateFormat("MMdd hhmm

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Hi Vino, I am using a custom FileInputFormat, but the mentioned problem only comes when I try a custom FilePathFilter. My whole file for that custom FilePathFilter is quoted below. Regarding enabling DEBUG, which classes/packages should I turn DEBUG on? as I am afraid that turning DEBUG on at t

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread vino yang
Hi Averell, Is this all the custom code for "CustomFileSource"? If not, can you share the entire file with us, and if you can set the log level to DEBUG, it will help you analyze and locate the problem. If you can't come to a conclusion, you can share the log with us. Thanks, vino. Averell 于20

Re: How to use checkpoint in flink1.5.3

2018-09-20 Thread spoon_lz
I'm sorry that I added some of my own code into the source code, and found some mistakes. Now the problem has been solved, which is caused by my own code -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Strange behaviour with checkpointing and custom FilePathFilter

2018-09-20 Thread Averell
Good day everyone, I have about 100 thousand files to read, and a custom FilePathFilter with a simple filterPath method defined as below (the custom part is only to check file-size and skip files with size = 0) override def filterPath(filePath: Path): Boolean = { filePath

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Oh yes exactly, enable is right. > Am 20.09.2018 um 17:48 schrieb Hequn Cheng : > > Hi Stefan, > > Do you mean enable object reuse? > If you want to reduce latency between chained operators, you can also try to > disable object-reuse: > > On Thu, Sep 20, 2018 at 10:37 PM Stefan Richter

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Hequn Cheng
Hi Stefan, Do you mean enable object reuse? > If you want to reduce latency between chained operators, you can also try > to disable object-reuse: On Thu, Sep 20, 2018 at 10:37 PM Stefan Richter wrote: > Sorry, forgot the link for reference [1], which is > https://ci.apache.org/projects/flink

Re: Writer has already been opened on BucketingSink to S3

2018-09-20 Thread Chengzhi Zhao
Thanks Stefan for replying, I created a JIRA ticket https://issues.apache.org/jira/browse/FLINK-10382 Best, Chengzhi On Thu, Sep 20, 2018 at 7:49 AM Stefan Richter wrote: > Hi, > > thanks for putting some effort into debugging the problem. Could you open > a Jira with the problem and your analy

Re: S3 connector Hadoop class mismatch

2018-09-20 Thread Stephan Ewen
Hi! A few questions to diagnose/fix this: Do you explicitly configure the "hadoop.security.group.mapping"? - If not, this setting may have leaked in from a Hadoop config in the classpath. We are fixing this in Flink 1.7, to make this insensitive to such settings leaking in. - If yes, then

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
FYI, here a link to my PR: https://github.com/apache/flink/pull/6723 > Am 20.09.2018 um 14:52 schrieb Stefan Richter : > > Hi, > > I think the failing precondition is too strict because sometimes a checkpoint > can overtake another checkpoint and in that case the commit is already > subsumed.

Re: Unit/Integration test for stateful source

2018-09-20 Thread Stefan Richter
Hi, maybe you can use AbstractStreamOperatorTestHarness to test your source, including the snapshotting. You can take a look at the tests of some other source, e.g. StatefulSequenceSourceTest#testCheckpointRestore. Best, Stefan > Am 20.09.2018 um 15:29 schrieb Darshan Singh : > > Hi, > > I a

[ANNOUNCE] Apache Flink 1.6.1 released

2018-09-20 Thread Till Rohrmann
The Apache Flink community is very happy to announce the release of Apache Flink 1.6.1, which is the first bug fix release for the Apache Flink 1.6 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming ap

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Sorry, forgot the link for reference [1], which is https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/datastream_api.html#controlling-latency > Am 20.09.2018 um 16:36 schrieb Stefan Richter : > > Hi, > > you provide not very much information for this question, e.g. what and how >

[ANNOUNCE] Apache Flink 1.5.4 released

2018-09-20 Thread Till Rohrmann
The Apache Flink community is very happy to announce the release of Apache Flink 1.5.4, which is the fourth bug fix release for the Apache Flink 1.5 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread Stefan Richter
Hi, you provide not very much information for this question, e.g. what and how exactly your measure, if this is a local or distributed setting etc. I assume that it is distributed and that the cause for your observation is the buffer timeout, i.e. the maximum time that Flink waits until sending

Re: Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread James Yu
The previous email seems unable to display embedded images, let me put on the links. > Hi, > > My team and I try to measure total time spent on our flink job and found > out that Flink takes 40ms ~ 100ms to proceed from one operator to another. > I wonder how can we reduce this transition time. >

Flink takes 40ms ~ 100ms to proceed from one operator to another?

2018-09-20 Thread James Yu
Hi, My team and I try to measure total time spent on our flink job and found out that Flink takes 40ms ~ 100ms to proceed from one operator to another. I wonder how can we reduce this transition time. Following DAG represents our job: and here is the screenshot of our log: at 19:37:04.564, the

Re: How to get the location of keytab when using flink on yarn

2018-09-20 Thread Aljoscha Krettek
Hi, if the YARN cluster does not have Kerberos enabled then Flink will not ship the keytab file to the worker nodes. This means that you have to make sure it is available at some location where your application can use it yourself. But this might have security risks. I'm afraid I don't know a

Re: Data loss when restoring from savepoint

2018-09-20 Thread Juho Autio
Hi Andrey! I was finally able to gather the DEBUG logs that you suggested. In short, the reducer logged that it processed at least some of the ids that were missing from the output. "At least some", because I didn't have the job running with DEBUG logs for the full 24-hour window period. So I was

Unit/Integration test for stateful source

2018-09-20 Thread Darshan Singh
Hi, I am writing a stateful source very similar to KafkaBaseConsumer but not as generic. I was looking on how we can use unit test cases and integration tests on this. I looked at the kafka-connector-based unit test cases. It seems that there is too much external things at play here like lots of

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-20 Thread Stefan Richter
Hi, I think the failing precondition is too strict because sometimes a checkpoint can overtake another checkpoint and in that case the commit is already subsumed. I will open a Jira and PR with a fix. Best, Stefan > Am 19.09.2018 um 10:04 schrieb PedroMrChaves : > > Hello, > > I have a runni

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-20 Thread Stefan Richter
A debug log for state backend and checkpoint coordinator could also help. > Am 20.09.2018 um 14:19 schrieb Stefan Richter : > > Hi, > > if some tasks take like 50 minutes, could you wait until such a checkpoint is > in progress and (let’s say after 10 minutes) log into the node and create a >

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-20 Thread Stefan Richter
Hi, if some tasks take like 50 minutes, could you wait until such a checkpoint is in progress and (let’s say after 10 minutes) log into the node and create a (or multiple over time) thread-dump(s) for the JVM that runs the slow checkpointing task. This could help to figure out where it is stuck

Re: How to get the location of keytab when using flink on yarn

2018-09-20 Thread Stefan Richter
Hi, maybe Aljoscha or Eron (both in CC) can help you with this problem, I think they might know best about the Kerberos security. Best, Stefan > Am 20.09.2018 um 11:20 schrieb 杨光 : > > Hi, > i am using the " per-job YARN session " mode deploy flink job on yarn and my > flink > version is 1.4

Re: Writer has already been opened on BucketingSink to S3

2018-09-20 Thread Stefan Richter
Hi, thanks for putting some effort into debugging the problem. Could you open a Jira with the problem and your analysis so that we can discuss how to proceed with it? Best, Stefan > Am 18.09.2018 um 23:16 schrieb Chengzhi Zhao : > > After checking the code, I think the issue might be related

Re: S3 connector Hadoop class mismatch

2018-09-20 Thread Stefan Richter
Hi, I could not find any open Jira for the problem you describe. Could you please open one? Best, Stefan > Am 19.09.2018 um 09:54 schrieb Paul Lam : > > Hi, > > I’m using StreamingFileSink of 1.6.0 to write logs to S3 and encounter a > classloader problem. It seems that there are conflicts

Re: multiple flink applications on yarn are shown in one application.

2018-09-20 Thread Stefan Richter
Hi, currently, Flink still has to use session mode under the hood if you submit the job in attached-mode. The reason is that the job could consists of multiple parts that require to run one after the other. This will be changed in the future and also should not happen if you submit the job deta

Re: Checkpointing not working

2018-09-20 Thread Stefan Richter
Hi, in the absence of any logs, my guess would be that your checkpoints are just not able to complete within 10 seconds, the state might be to large or the network and fs to slow. Are you using full or incremental checkpoints? For your relative small interval, I suggest that you try using incre

Re: Why FlinkKafkaConsumer08 does not support exactly once or at least once semantic?

2018-09-20 Thread 徐涛
Hi Stefan, I didn`t notice that in the webpage.. Thanks a lot for your help! That makes me understand the delivery guarantees more clearly. Best Henry > 在 2018年9月20日,下午5:02,Stefan Richter > 写道: > > Hi, > > I think this part of the documenta

How to get the location of keytab when using flink on yarn

2018-09-20 Thread 杨光
Hi, i am using the " per-job YARN session " mode deploy flink job on yarn and my flink version is 1.4.1. https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/security-kerberos.html My use case is the yarn cluster where the flink job running is not enabled the kerberos mode in core-sit

Re: Why FlinkKafkaConsumer08 does not support exactly once or at least once semantic?

2018-09-20 Thread Stefan Richter
Hi, I think this part of the documentation is talking about KafkaProducer, and you are reading in the source code of KafkaConsumer. Best, Stefan > Am 20.09.2018 um 10:48 schrieb 徐涛 : > > Hi All, > In document of Flink 1.6, it says that "Before 0.9 Kafka did not > provide any mechanisms

Re: Flink 1.5.2 process keeps reference to deleted blob files.

2018-09-20 Thread Stefan Richter
Hi, I think it would be very helpful if you could identify what data is behind. For example, I could imagine that it can be a jar file that was used by the TM and some classes are still in use or loaded by a classloader that was not yet GCed. Depending on that, there could be a problem in the u

Re: In which case the StreamNode has multiple output edges?

2018-09-20 Thread 徐涛
Hi Fabian, I realized that scenario later :) Thank you all the same. Best Henry > 在 2018年9月18日,下午4:10,Fabian Hueske 写道: > > Hi, > > Any operator can have multiple out-going edges. > If you implement something like: > > DataStream instream = ... > > DataStream outstream1 = in

Why FlinkKafkaConsumer08 does not support exactly once or at least once semantic?

2018-09-20 Thread 徐涛
Hi All, In document of Flink 1.6, it says that "Before 0.9 Kafka did not provide any mechanisms to guarantee at-least-once or exactly-once semantics” I read the source code of FlinkKafkaConsumer08, and the comment says: “Please note that Flink snapshots the offsets internally as pa

Re: HA failing for 1.6.0 job cluster with docker-compose

2018-09-20 Thread vino yang
Hi all, Oh, I took this ticket, will fix it as soon as possible. Thanks, vino. Till Rohrmann 于2018年9月20日周四 下午4:35写道: > Hi Tzanko, > > in order to make the container entrypoint properly work with HA, we need > to fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291). > At the mo

Re: HA failing for 1.6.0 job cluster with docker-compose

2018-09-20 Thread Till Rohrmann
Hi Tzanko, in order to make the container entrypoint properly work with HA, we need to fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291). At the moment, we generate a new JobID for every restart of the cluster entrypoint container. Due to that the system cannot find the existin

Re: How to use checkpoint in flink1.5.3

2018-09-20 Thread Stefan Richter
Hi, did you introduce some custom modifications to the code? Your stack trace does not match the lines in the code of release-1.5.3, e.g. line 230 is not in method internalTimeServiceManager(…) which makes it hard to draw any conclusions. Best, Stefan > Am 19.09.2018 um 14:03 schrieb spoon_lz

Re: job submitting hanging

2018-09-20 Thread vino yang
Hi Jason, Chesnay and Gary are familiar with the rest API, maybe they can help you. If you can share the client's commit log and set the client and JM log level to DEBUG it would be better. Thanks, vino. Jason Chang 于2018年9月19日周三 上午8:31写道: > Hi all, > > Our team is setting up Flink on k8s. Aft

Re: Checkpointing not working

2018-09-20 Thread vino yang
Hi Yubraj, Can you set your log print level to DEBUG and share it with us or share a screenshot of your Flink web UI checkpoint information? Thanks, vino. Jörn Franke 于2018年9月19日周三 下午2:37写道: > What do the logfiles say? > > How does the source code looks like? > > Is it really needed to do chec

Re: HA failing for 1.6.0 job cluster with docker-compose

2018-09-20 Thread vino yang
Hi Tzanko, Maybe Till is more appropriate to answer this question. Thanks, vino. Tzanko Matev 于2018年9月19日周三 下午5:47写道: > Dear all, > > I am currently experimenting with a Flink 1.6.0 job cluster. The goal is > to run a streaming job on K8s. Right now I am using docker-compose to > experiment wi