Hi all,

I wasn't sure if this would be the best audience, if not, please advise if
you know of a better place to ask it. I figured that at least some folks
here either work for Ververica or might have used their platform.

*tl;dr; I'm trying to migrate an existing stateful Flink job to run in
Ververica Platform (Community) and I'm noticing that it doesn't seem that
all of the state is being properly handed off (only _metadata).*

I'm currently in the process of migrating an existing Flink job that is
running in Kubernetes on its own to run within the Ververica platform. The
issue here is that the job itself is stateful, so I want to ensure I can
migrate over that state so when the new job kicks off, it's a fairly
seamless transition.

Basically, what I've done up to this point is create a script as part of
the Ververica platform deployment that will:

   1. Check for the existence of any of the known jobs that have been
   migrated.
      - If one is found, it will stop the job, taking a full savepoint, and
      store the savepoint path within a configmap for that job used solely for
      migration purposes.
      - If one is not found, it will assume the job has been migrated.
   2. Create a Deployment for each of the new jobs, pointing to the
   appropriate configuration, jars, etc.
   3. Check for the presence of one of the previous migration configmaps
   and issue a request to create a savepoint for that deployment.
      1. This involves using the Ververica REST API to grab the appropriate
      deployment information and issuing a request to the Savepoints
endpoint of
      the same REST API to "add" the savepoint.

I've confirmed the above "works" and indeed stops any legacy jobs, creates
the resources (i.e. configmaps) used for the migration, starts up the new
job within Ververica and I can see evidence within the UI that a savepoint
was "COPIED" for that deployment.

However, when comparing (in GCS) the previous savepoint for the old job and
the one now managed by Ververica for the job, I notice that the new one
only contains a single _metadata file:

[image: image.png]

Whereas the previous contained a metadata file and another related data
file:

[image: image.png]
This leads me to believe that the new job might not know about any items
previously stored in state, which could be problematic.

When reviewing over the documentation for "manually adding a savepoint" for
Ververica Platform 2.6
<https://docs.ververica.com/v2.6/user_guide/application_operations/deployments/savepoints.html#manually-adding-a-savepoint-resource>,
I noticed that the payload to the Savepoints endpoint looked like the
following, which was what I used:

metadata:
  deploymentId: ${deploymentId}
  annotations:
    com.dataartisans.appmanager.controller.deployment.spec.version:
${deploymentSpecVersion}
  type: ${type} (used FULL in my case)spec:
  savepointLocation:  ${savepointLocation}
  flinkSavepointId: 00000000-0000-0000-0000-000000000000status:
  state: COMPLETED


The empty UUID was a bit concerning and I was curious if that might be the
reason my additional data files didn't come across from the savepoint as
well (I noticed in 2.7 this is an optional argument in the payload). I
don't see much more for any additional configuration that would otherwise
specify to pull everything including _metadata.

Any ideas or guidance would be helpful.

Rion

Reply via email to