Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Averell
Thank you Stefan, I'll try to follow your guide to debug. And sorry for being confusing in the previous email. When I said "different builds", I meant different versions of my application, not different builds of Flink. Between versions of my application, I do add/remove some operators.

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Stefan Richter
Hi, I see, then the important question for me is if the problem exists on the release/master code or just on your branches. Of course we can hardly give any advice for custom builds and without any code. In general, you should debug in HeapKeyedStateBackend lines lines 774-776 (the write part)

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Averell
Hi Kostas, Stefan, The problem doesn't come on all of my builds, so it is a little bit difficult to track. Are there any specific classes that I can turn DEBUG on to help in finding the problem? (Turning DEBUG on globally seems too much). Will try to minimize the code and post it. One more point

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Stefan Richter
Hi, I think it is rather unlikely that this is the problem because it should give a different kind of exception. Would it be possible to provide a minimal and self-contained example code for a problematic job? Best, Stefan > On 15. Oct 2018, at 08:29, Averell wrote: > > Hi everyone, > >

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Kostas Kloudas
Hi Averell, This could be the root cause of your problem! Thanks for digging into it. Would it be possible for you to verify that this is your problem by manually setting the UUID and seeing if the problem disappears? In addition, please file a JIRA. Thanks a lot, Kostas > On Oct 15, 2018,

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Averell
Hi everyone, In the StreamExecutionEnvironment.createFileInput method, a file source is created as following: /SingleOutputStreamOperator source = *addSource*(monitoringFunction, sourceName) .*transform*("Split Reader: " + sourceName, typeInfo,

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Hi Kostas, No, the same code was used. I (1) started the job, (2) created a savepoint, (3) cancelled the job, (4) restored the job with the same command as in (1) with the addition "-s ". Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Kostas Kloudas
You restore your job with the custom source from a savepoint taken without the custom source? > On Oct 10, 2018, at 11:34 AM, Averell wrote: > > Hi Kostas, > > Yes, I modified ContinuousFileMonitoringFunction to add one more > ListState. The error might/should have come from that, but I

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Hi Kostas, Yes, I modified ContinuousFileMonitoringFunction to add one more ListState. The error might/should have come from that, but I haven't been able to find out why. All of my keyed streams are defined by Scala tuples like: /ikeyBy(r => (r.customer_id, r.address))/, and the fields using as

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Kostas Kloudas
Hi Averell, In the logs there are some “Split Reader: Custom File Source:” This is a custom source you implemented? Also is your keySelector deterministic with proper equals and hashcode methods? Cheers, Kostas > On Oct 10, 2018, at 10:50 AM, Averell wrote: > > Hi Stefan, Dawid, > > I

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Hi Stefan, Dawid, I hadn't changed anything in the configuration. Env's parallelism stayed at 64. Some source/sink operators have parallelism of 1 to 8. I'm using Flink 1.7-SNAPSHOT, with the code pulled from the master branch about 5 days back. Savepoint was saved to either S3 or HDFS (I tried

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Le-Van Huyen
Hi Stefan, Dawid, I hadn't changed anything in the configuration. Env's parallelism stayed at 64. Some source/sink operators have parallelism of 1 to 8. I'm using Flink 1.7-SNAPSHOT, with the code pulled from master about 5 days back. Savepoint was saved to either S3 or HDFS (I tried multiple

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Stefan Richter
Hi, adding to Dawids questions, it would also be very helpful to know which Flink version was used to create the savepoint, which Flink version was used in the restore attempt, if the savepoint was moved or modified. Outside of potential conflicts with those things, I would not expect anything

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Dawid Wysakowicz
Hi Averell, Do you try to scale the job up, meaning do you increase the job parallelism? Have you increased the job max parallelism by chance? If so this is not supported. The max parallelism parameter is used to create key groups that can be further assigned to parallel operators. This parameter

Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Averell
Hi everyone, I'm getting the following error when trying to restore from a savepoint. Here below is the output from flink bin, and in the attachment is a TM log. I didn't have any change in the app before and after savepoint. All Window operators have been assigned unique ID string. Could you