Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-07-01 Thread Austin Cawley-Edwards
Hi Shilpa, I've confirmed that "recovered" jobs are not compatible between minor versions of Flink (e.g., between 1.12 and 1.13). I believe the issue is that the session cluster was upgraded to 1.13 without first stopping the jobs running on it. If this is the case, the workaround is to stop

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-07-01 Thread Shilpa Shankar
Hi Zhu, Does is mean our upgrades are going to fail and the jobs are not backward compatible? I did verify the job itself is built using 1.13.0. Is there a workaround for this? Thanks, Shilpa On Wed, Jun 30, 2021 at 11:14 PM Zhu Zhu wrote: > Hi Shilpa, > > JobType was introduced in 1.13. So

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Zhu Zhu
Hi Shilpa, JobType was introduced in 1.13. So I guess the cause is that the client which creates and submit the job is still 1.12.2. The client generates a outdated job graph which does not have its JobType set and resulted in this NPE problem. Thanks, Zhu Austin Cawley-Edwards 于2021年7月1日周四

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Austin Cawley-Edwards
Hi Shilpa, Thanks for reaching out to the mailing list and providing those logs! The NullPointerException looks odd to me, but in order to better guess what's happening, can you tell me a little bit more about what your setup looks like? How are you deploying, i.e., standalone with your own

Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Shilpa Shankar
Hello, We have a flink session cluster in kubernetes running on 1.12.2. We attempted an upgrade to v 1.13.1, but the jobmanager pods are continuously restarting and are in a crash loop. Logs are attached for reference. How do we recover from this state? Thanks, Shilpa 2021-06-30 16:03:25,965