Re: Using a ProcessFunction as a "Source"

2018-08-24 Thread vino yang
Hi Addison, I have a lot of things I don't understand. Is your source self-generated message? Why can't source receive input? If the source is unacceptable then why is it called source? Isn't kafka-connector the input as source? If you mean that under normal circumstances it can't receive

Re: anybody can start flink with job mode?

2018-08-24 Thread Hao Sun
Thanks, I'll look into it. On Fri, Aug 24, 2018, 19:44 vino yang wrote: > Hi Hao Sun, > > From the error log, it seems that the jar package for the job was not > found. > You must make sure your Jar is in the classpath. > Related documentation may not be up-to-date, and there is a discussion on

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread vino yang
Hi Averell, The checkpoint is automatically triggered periodically according to the checkpoint interval set by the user. I believe that you should have no doubt about this. There are many reasons for the Job failure. The technical definition is that the Job does not normally enter the final

Re: anybody can start flink with job mode?

2018-08-24 Thread vino yang
Hi Hao Sun, >From the error log, it seems that the jar package for the job was not found. You must make sure your Jar is in the classpath. Related documentation may not be up-to-date, and there is a discussion on this issue on this mailing list. [1] I see that the status of FLINK-10001 [2] is

anybody can start flink with job mode?

2018-08-24 Thread Hao Sun
I got an error like this. $ docker run -it flink-job:latest job-cluster Starting the job-cluster config file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 1 parallelism.default: 1 rest.port:

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
Hi, Using StreamingFileSink is not a convenient option for production use for us as it doesn't support s3*. I could use StreamingFileSink just to verify, but I don't see much point in doing so. Please consider my previous comment: > I realized that BucketingSink must not play any role in this

Re: [External] Re: How to do test in Flink?

2018-08-24 Thread Joe Malt
Hi Chang, A time-saving tip for finding which library contains a class: go to https://search.maven.org/ and enter fc: followed by the fully-qualified name of the class. You should get the library as a search result. In this case for example, you'd search for

Re: Data loss when restoring from savepoint

2018-08-24 Thread Andrey Zagrebin
Ok, I think before further debugging the window reduced state, could you try the new ‘StreamingFileSink’ [1] introduced in Flink 1.6.0 instead of the previous 'BucketingSink’? Cheers, Andrey [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
Yes, sorry for my confusing comment. I just meant that it seems like there's a bug somewhere now that the output is missing some data. > I would wait and check the actual output in s3 because it is the main result of the job Yes, and that's what I have already done. There seems to be always some

Re: Data loss when restoring from savepoint

2018-08-24 Thread Andrey Zagrebin
Hi Juho, So it is a per key deduplication job. Yes, I would wait and check the actual output in s3 because it is the main result of the job and > The late data around the time of taking savepoint might be not included into > the savepoint but it should be behind the snapshotted offset in

Re: AvroSchemaConverter and Tuple classes

2018-08-24 Thread françois lacombe
Hi Timo, Thanks for your answer I was looking for a Tuple as to feed a BatchTableSink subclass, but it may be achived with a Row instead? All the best François 2018-08-24 10:21 GMT+02:00 Timo Walther : > Hi, > > tuples are just a sub category of rows. Because the tuple arity is limited > to

Re: Data loss when restoring from savepoint

2018-08-24 Thread Juho Autio
Thanks for your answer! I check for the missed data from the final output on s3. So I wait until the next day, then run the same thing re-implemented in batch, and compare the output. > The late data around the time of taking savepoint might be not included into the savepoint but it should be

Re: Data loss when restoring from savepoint

2018-08-24 Thread Andrey Zagrebin
Hi Juho, Where exactly does the data miss? When do you notice that? Do you check it: - debugging `DistinctFunction.reduce` right after resume in the middle of the day or - some distinct records miss in the final output of BucketingSink in s3 after window result is actually triggered and

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Averell
Hi Vino, Regarding this statement "/Checkpoints are taken automatically and are used for automatic restarting job in case of a failure/", I do not quite understand the definition of a failure, and how to simulate that while testing my application. Possible scenarios that I can think of: (1)

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Andrey Zagrebin
This thread is also useful in this context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/difference-between-checkpoints-amp-savepoints-td14787.html

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Andrey Zagrebin
Hi Henry, In addition to Vino’s answer, there are several things to keep in mind about “checkpoints vs savepoints". Checkpoints are designed mostly for fault tolerance of running Flink job and automatic recovery that is why by default Flink manages their storage itself. Though it is correct

Re: How to do test in Flink?

2018-08-24 Thread Chang Liu
No worries, I found it here: org.apache.flink flink-runtime_${scala.binary.version} ${flink.version} test-jar test Best regards/祝好, Chang Liu 刘畅 On Fri, Aug 24, 2018 at 1:16 PM Chang Liu wrote: > Hi Hequn, > > I have added the following dependencies: > > >

Re: How to do test in Flink?

2018-08-24 Thread Chang Liu
Hi Hequn, I have added the following dependencies: org.apache.flink flink-streaming-java_${scala.binary.version} ${flink.version} test-jar test org.mockito mockito-core 2.21.0 test But got the exception: java.lang.NoClassDefFoundError:

Raising a bug in Flink's unit test scripts

2018-08-24 Thread Averell
Good day everyone, I'm writing unit test for the bug fix FLINK-9940, and found that in some existing tests in flink-fs-tests cannot detect the issue when the file monitoring function emits duplicated files (i.e. a same file is reported multiple times). Could I just fix this as part of that

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread vino yang
Hi Henry, A good answer from stackoverflow: Apache Flink's Checkpoints and Savepoints are similar in that way they both are mechanisms for preserving internal state of Flink's applications. Checkpoints are taken automatically and are used for automatic restarting job in case of a failure.

Re: Aggregator in CEP

2018-08-24 Thread antonio saldivar
hello thank you very much I took a look on the link but now how can I check the conditions to get aggregator results? El vie., 24 ago. 2018 a las 5:27, aitozi () escribió: > Hi, > > Now that it still not support the aggregator function in cep > iterativeCondition. Now may be you need to check

Re: Aggregator in CEP

2018-08-24 Thread aitozi
Hi, Now that it still not support the aggregator function in cep iterativeCondition. Now may be you need to check the condition by yourself to get the aggregator result. I will work for this these day, you can take a look on this issue:

Re: State TTL in Flink 1.6.0

2018-08-24 Thread Andrey Zagrebin
Hi Juho, As Aljoscha mentioned the current TTL implementation was mostly targeted to data privacy applications where only processing time matters. I think the event time can be also useful for TTL and should address your concerns. The event time extension is on the road map for the future

Aggregator in CEP

2018-08-24 Thread antonio saldivar
Hello I am developing an application where I use Flink(v 1.4.2) CEP , is there any aggregation function to match cumulative amounts or counts in a IterativeCondition within a period of time for a KeyBy elements? if a cumulative amount reaches thresholds fire a result Thank you Regards

答复: jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-24 Thread Yuan,Youjun
One more safer approach is to execute cancel with savepoint on all jobs first >> this sounds great! Thanks Youjun 发件人: vino yang 发送时间: Friday, August 24, 2018 2:43 PM 收件人: Yuan,Youjun ; user 主题: Re: jobmanager holds too many CLOSE_WAIT connection to datanode Hi Youjun, You can see if there

Re: AvroSchemaConverter and Tuple classes

2018-08-24 Thread Timo Walther
Hi, tuples are just a sub category of rows. Because the tuple arity is limited to 25 fields. I think the easiest solution would be to write your own converter that maps rows to tuples if you know that you will not need more than 25 fields. Otherwise it might be easier to just use a

Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread 徐涛
Hi All, I check the documentation of Flink release 1.6, find that I can use checkpoints to resume the program either. As I encountered some problems when using savepoints, I have the following questions: 1. Can I use checkpoints only, but not use savepoints, because it can also use to resume

Re: jobmanager holds too many CLOSE_WAIT connection to datanode

2018-08-24 Thread vino yang
Hi Youjun, You can see if there is any real data transfer between these connections. I guess there may be some connection leaks here, and if so, it's a bug. On the other hand, the 1.4 version is a bit old, can you compare the 1.5 or 1.6 whether the same problem exists? I suggest you create an