Re: Running JobManager as Deployment instead of Job

2019-02-11 Thread Till Rohrmann
Hi Vishal,

you can also keep the same cluster id when cancelling a job with savepoint
and then resuming a new job from it. Terminating the job should clean up
all state in Zk.

Cheers,
Till

On Fri, Feb 8, 2019 at 11:26 PM Vishal Santoshi 
wrote:

> In one case however, we do want to retain the same cluster id ( think
> ingress on k8s  and thus SLAs with external touch points ) but it is
> essentially a new job ( added an incompatible change but at the interface
> level it retains the same contract ) , the only way seems to be to remove
> the chroot/subcontext from ZK , and relaunch , essentially deleting ant
> vestiges of the previous incarnation. And that is fine if that is indeed
> the process.
>
>
> On Fri, Feb 8, 2019 at 7:58 AM Till Rohrmann  wrote:
>
>> If you keep the same cluster id, the upgraded job should pick up
>> checkpoints from the completed checkpoint store. However, I would recommend
>> to take a savepoint and resume from this savepoint because then you can
>> also specify that you allow non restored state, for example.
>>
>> Cheers,
>> Till
>>
>> On Fri, Feb 8, 2019 at 11:20 AM Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> Is the rationale of using a jobID 00* also roughly the same. As in a
>>> Flink job cluster is a single job and thus a single job id suffices ?  I am
>>> more wondering about the case when we are doing a compatible changes to a
>>> job and want to resume ( given we are in HA mode and thus have a
>>> chroot/subcontext on ZK for the job cluster ) ,  it would make no sense to
>>> give a brand new job id ?
>>>
>>> On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann 
>>> wrote:
>>>
 Hi Sergey,

 the rationale why we are using a K8s job instead of a deployment is
 that a Flink job cluster should terminate after it has successfully
 executed the Flink job. This is unlike a session cluster which should run
 forever and for which a K8s deployment would be better suited.

 If in your use case a K8s deployment would better work, then I would
 suggest to change the `job-cluster-job.yaml` accordingly.

 Cheers,
 Till

 On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov 
 wrote:

> Hi,
>
> my team is currently experimenting with Flink running in Kubernetes
> (job cluster setup). And we found out that with JobManager being deployed
> as "Job" we can't just simply update certain values in job's yaml, e.g.
> spec.template.spec.containers.image (
> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
> This causes certain troubles in our CI/CD pipelines so we are thinking
> about using "Deployment" instead of "Job".
>
> With that being said I'm wondering what was the motivation behind
> using "Job" resource for deploying JobManager? And are there any pitfalls
> related to using Deployment and not Job for JobManager?
>
> Thank you in advance.
> --
> Best regards,
> Sergey Belikov
>



Re: Running JobManager as Deployment instead of Job

2019-02-08 Thread Vishal Santoshi
In one case however, we do want to retain the same cluster id ( think
ingress on k8s  and thus SLAs with external touch points ) but it is
essentially a new job ( added an incompatible change but at the interface
level it retains the same contract ) , the only way seems to be to remove
the chroot/subcontext from ZK , and relaunch , essentially deleting ant
vestiges of the previous incarnation. And that is fine if that is indeed
the process.


On Fri, Feb 8, 2019 at 7:58 AM Till Rohrmann  wrote:

> If you keep the same cluster id, the upgraded job should pick up
> checkpoints from the completed checkpoint store. However, I would recommend
> to take a savepoint and resume from this savepoint because then you can
> also specify that you allow non restored state, for example.
>
> Cheers,
> Till
>
> On Fri, Feb 8, 2019 at 11:20 AM Vishal Santoshi 
> wrote:
>
>> Is the rationale of using a jobID 00* also roughly the same. As in a
>> Flink job cluster is a single job and thus a single job id suffices ?  I am
>> more wondering about the case when we are doing a compatible changes to a
>> job and want to resume ( given we are in HA mode and thus have a
>> chroot/subcontext on ZK for the job cluster ) ,  it would make no sense to
>> give a brand new job id ?
>>
>> On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann 
>> wrote:
>>
>>> Hi Sergey,
>>>
>>> the rationale why we are using a K8s job instead of a deployment is that
>>> a Flink job cluster should terminate after it has successfully executed the
>>> Flink job. This is unlike a session cluster which should run forever and
>>> for which a K8s deployment would be better suited.
>>>
>>> If in your use case a K8s deployment would better work, then I would
>>> suggest to change the `job-cluster-job.yaml` accordingly.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov 
>>> wrote:
>>>
 Hi,

 my team is currently experimenting with Flink running in Kubernetes
 (job cluster setup). And we found out that with JobManager being deployed
 as "Job" we can't just simply update certain values in job's yaml, e.g.
 spec.template.spec.containers.image (
 https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
 This causes certain troubles in our CI/CD pipelines so we are thinking
 about using "Deployment" instead of "Job".

 With that being said I'm wondering what was the motivation behind using
 "Job" resource for deploying JobManager? And are there any pitfalls related
 to using Deployment and not Job for JobManager?

 Thank you in advance.
 --
 Best regards,
 Sergey Belikov

>>>


Re: Running JobManager as Deployment instead of Job

2019-02-08 Thread Till Rohrmann
If you keep the same cluster id, the upgraded job should pick up
checkpoints from the completed checkpoint store. However, I would recommend
to take a savepoint and resume from this savepoint because then you can
also specify that you allow non restored state, for example.

Cheers,
Till

On Fri, Feb 8, 2019 at 11:20 AM Vishal Santoshi 
wrote:

> Is the rationale of using a jobID 00* also roughly the same. As in a
> Flink job cluster is a single job and thus a single job id suffices ?  I am
> more wondering about the case when we are doing a compatible changes to a
> job and want to resume ( given we are in HA mode and thus have a
> chroot/subcontext on ZK for the job cluster ) ,  it would make no sense to
> give a brand new job id ?
>
> On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann  wrote:
>
>> Hi Sergey,
>>
>> the rationale why we are using a K8s job instead of a deployment is that
>> a Flink job cluster should terminate after it has successfully executed the
>> Flink job. This is unlike a session cluster which should run forever and
>> for which a K8s deployment would be better suited.
>>
>> If in your use case a K8s deployment would better work, then I would
>> suggest to change the `job-cluster-job.yaml` accordingly.
>>
>> Cheers,
>> Till
>>
>> On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov 
>> wrote:
>>
>>> Hi,
>>>
>>> my team is currently experimenting with Flink running in Kubernetes (job
>>> cluster setup). And we found out that with JobManager being deployed as
>>> "Job" we can't just simply update certain values in job's yaml, e.g.
>>> spec.template.spec.containers.image (
>>> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
>>> This causes certain troubles in our CI/CD pipelines so we are thinking
>>> about using "Deployment" instead of "Job".
>>>
>>> With that being said I'm wondering what was the motivation behind using
>>> "Job" resource for deploying JobManager? And are there any pitfalls related
>>> to using Deployment and not Job for JobManager?
>>>
>>> Thank you in advance.
>>> --
>>> Best regards,
>>> Sergey Belikov
>>>
>>


Re: Running JobManager as Deployment instead of Job

2019-02-08 Thread Vishal Santoshi
Is the rationale of using a jobID 00* also roughly the same. As in a
Flink job cluster is a single job and thus a single job id suffices ?  I am
more wondering about the case when we are doing a compatible changes to a
job and want to resume ( given we are in HA mode and thus have a
chroot/subcontext on ZK for the job cluster ) ,  it would make no sense to
give a brand new job id ?

On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann  wrote:

> Hi Sergey,
>
> the rationale why we are using a K8s job instead of a deployment is that a
> Flink job cluster should terminate after it has successfully executed the
> Flink job. This is unlike a session cluster which should run forever and
> for which a K8s deployment would be better suited.
>
> If in your use case a K8s deployment would better work, then I would
> suggest to change the `job-cluster-job.yaml` accordingly.
>
> Cheers,
> Till
>
> On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov 
> wrote:
>
>> Hi,
>>
>> my team is currently experimenting with Flink running in Kubernetes (job
>> cluster setup). And we found out that with JobManager being deployed as
>> "Job" we can't just simply update certain values in job's yaml, e.g.
>> spec.template.spec.containers.image (
>> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
>> This causes certain troubles in our CI/CD pipelines so we are thinking
>> about using "Deployment" instead of "Job".
>>
>> With that being said I'm wondering what was the motivation behind using
>> "Job" resource for deploying JobManager? And are there any pitfalls related
>> to using Deployment and not Job for JobManager?
>>
>> Thank you in advance.
>> --
>> Best regards,
>> Sergey Belikov
>>
>


Re: Running JobManager as Deployment instead of Job

2019-02-07 Thread Till Rohrmann
Hi Sergey,

the rationale why we are using a K8s job instead of a deployment is that a
Flink job cluster should terminate after it has successfully executed the
Flink job. This is unlike a session cluster which should run forever and
for which a K8s deployment would be better suited.

If in your use case a K8s deployment would better work, then I would
suggest to change the `job-cluster-job.yaml` accordingly.

Cheers,
Till

On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov 
wrote:

> Hi,
>
> my team is currently experimenting with Flink running in Kubernetes (job
> cluster setup). And we found out that with JobManager being deployed as
> "Job" we can't just simply update certain values in job's yaml, e.g.
> spec.template.spec.containers.image (
> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
> This causes certain troubles in our CI/CD pipelines so we are thinking
> about using "Deployment" instead of "Job".
>
> With that being said I'm wondering what was the motivation behind using
> "Job" resource for deploying JobManager? And are there any pitfalls related
> to using Deployment and not Job for JobManager?
>
> Thank you in advance.
> --
> Best regards,
> Sergey Belikov
>


Running JobManager as Deployment instead of Job

2019-02-05 Thread Sergey Belikov
Hi,

my team is currently experimenting with Flink running in Kubernetes (job
cluster setup). And we found out that with JobManager being deployed as
"Job" we can't just simply update certain values in job's yaml, e.g.
spec.template.spec.containers.image (
https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
This causes certain troubles in our CI/CD pipelines so we are thinking
about using "Deployment" instead of "Job".

With that being said I'm wondering what was the motivation behind using
"Job" resource for deploying JobManager? And are there any pitfalls related
to using Deployment and not Job for JobManager?

Thank you in advance.
-- 
Best regards,
Sergey Belikov