Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-31 Thread Timothy Bess
Hi Igal, Thanks for the help! I'll switch over to that. I ended up defaulting null to empty string in that deserializer and deploying my own jar to get production going again. The thing that makes this case tricky is that my code was publishing empty string, not null, and that is apparently

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-31 Thread Igal Shilman
Hi Tim, It is unfortunate that the error message was so minimal, we'll definitely improve that (FLINK-22809). Skipping NULL keys is a bit problematic, although technically possible, I'm not sure that this is how we should handle this. Let me follow up on that. The way you can customize the

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Timothy Bess
Ok so after digging into it a bit it seems that the exception was thrown here: https://github.com/apache/flink-statefun/blob/release-2.2/statefun-flink/statefun-flink-io-bundle/src/main/java/org/apache/flink/statefun/flink/io/kafka/RoutableProtobufKafkaIngressDeserializer.java#L48 I think it'd be

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Timothy Bess
Oh wow that Harness looks cool, I'll have to take a look at that. Unfortunately the JobManager UI seems to just show this: [image: image.png] Though it does seem that maybe the source function is where the failure is happening according to this? [image: image.png] Still investigating, but I do

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Arvid Heise
If logs are not helping, I think the remaining option is to attach a debugger [1]. I'd probably add a breakpoint to LegacySourceFunctionThread#run and see what happens. If the issue is in recovery, you should add a breakpoint to StreamTask#beforeInvoke. [1]

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Igal Shilman
Hi Tim, Any additional logs from before are highly appreciated, this would help us to trace this issue. By the way, do you see something in the JobManager's UI? On Fri, May 28, 2021 at 9:06 AM Tzu-Li (Gordon) Tai wrote: > Hi Timothy, > > It would indeed be hard to figure this out without any

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Tzu-Li (Gordon) Tai
Hi Timothy, It would indeed be hard to figure this out without any stack traces. Have you tried changing to debug level logs? Maybe you can also try using the StateFun Harness to restore and run your job in the IDE - in that case you should be able to see which code exactly is throwing this

Statefun 2.2.2 Checkpoint restore NPE

2021-05-27 Thread Timothy Bess
Hi, Just checking to see if anyone has experienced this error. Might just be a Flink thing that's irrelevant to statefun, but my job keeps failing over and over with this message: 2021-05-28 03:51:13,001 INFO org.apache.flink.streaming.connectors.kafka. FlinkKafkaProducer [] - Starting