Thanks! I did not see the znode and thus did not paste the ls...anywaz will
get you the full JM log ASAP....

On Thu, Jun 28, 2018, 5:35 PM Gary Yao <g...@data-artisans.com> wrote:

> Hi Vishal,
>
> The znode /flink_test/da_15/leader/rest_server_lock should exist as long
> as your
> Flink 1.5 cluster is running. In 1.4 this znode will not be created. Are
> you
> sure that the znode does not exist? Unfortunately you only attached the
> output
> of "ls /flink_test/da_15".
>
> Can you share the complete JobManager log files from a cluster that is
> (re-)starting?
>
> Best,
> Gary
>
> On Thu, Jun 28, 2018 at 4:10 PM, Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> I am not seeing rest_server_lock. Is it transient ( ephemeral znode )
>> for the duration of the cli command ?
>>
>>
>> [zk: localhost:2181(CONNECTED) 2] ls /flink_test/da_15
>>
>> [jobgraphs, leader, checkpoints, leaderlatch, checkpoint-counter]
>>
>>
>> The logs say
>>
>> 2018-06-28 14:02:56 INFO  ZooKeeperLeaderRetrievalService:100 - Starting
>> ZooKeeperLeaderRetrievalService /leader/rest_server_lock.
>>
>> 2018-06-28 14:02:56 INFO  ZooKeeperLeaderRetrievalService:100 - Starting
>> ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
>>
>> Is this a relative path,
>>
>> high-availability.zookeeper.path.root: /flink_test
>>
>> high-availability.cluster-id: /da_15
>>
>>
>> I do not see  /leader/rest_server_lock both during the duration of the
>> cli run ( or before or after ).
>>
>> I am a little stumped.... I do not see the above logs on 1.4 so am not
>> sure whether /leader/rest_server_lock is the new code...
>>
>>
>> On Thu, Jun 28, 2018 at 3:30 AM, Christophe Jolif <cjo...@gmail.com>
>> wrote:
>>
>>> Chesnay,
>>>
>>> Do you have rough idea of the 1.5.1 timeline?
>>>
>>> Thanks,
>>> --
>>> Christophe
>>>
>>> On Mon, Jun 25, 2018 at 4:22 PM, Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> The watermark issue is know and will be fixed in 1.5.1
>>>>
>>>>
>>>> On 25.06.2018 15:03, Vishal Santoshi wrote:
>>>>
>>>> Thank you....
>>>>
>>>> One addition
>>>>
>>>> I do not see WM info on the UI  ( Attached )
>>>>
>>>> Is this a know issue. The same pipe on our production has the WM ( In
>>>> fact never had an issue with  Watermarks not appearing ) . Am I missing
>>>> something ?
>>>>
>>>> On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <fhue...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Vishal,
>>>>>
>>>>> 1. I don't think a rolling update is possible. Flink 1.5.0 changed the
>>>>> process orchestration and how they communicate. IMO, the way to go is to
>>>>> start a Flink 1.5.0 cluster, take a savepoint on the running job, start
>>>>> from the savepoint on the new cluster and shut the old job down.
>>>>> 2. Savepoints should be compatible.
>>>>> 3. You can keep the slot configuration as before.
>>>>> 4. As I said before, mixing 1.5 and 1.4 processes does not work (or at
>>>>> least, it was not considered a design goal and nobody paid attention that
>>>>> it is possible).
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>> 2018-06-23 13:38 GMT+02:00 Vishal Santoshi <vishal.santo...@gmail.com>
>>>>> :
>>>>>
>>>>>>
>>>>>> 1.
>>>>>> Can or has any one  done  a rolling upgrade from 1.4 to 1.5 ?  I am
>>>>>> not sure we can. It seems that JM cannot recover jobs with this exception
>>>>>>
>>>>>> Caused by: java.io.InvalidClassException:
>>>>>> org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration;
>>>>>> local class incompatible: stream classdesc serialVersionUID =
>>>>>> -647384516034982626, local class serialVersionUID = 2
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2.
>>>>>> Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 3.
>>>>>>
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html#update-configuration-for-reworked-job-deployment
>>>>>> The taskmanager.numberOfTaskSlots: What would be the desired setting
>>>>>> in a stand alone ( non mesos/yarn ) cluster ?
>>>>>>
>>>>>>
>>>>>> 4. I suspend all jobs and establish 1.5 on the JM ( the TMs are still
>>>>>> running with 1.4 ) . JM refuse to start  with
>>>>>>
>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]:
>>>>>> 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job
>>>>>> 454cd84a519f3b50e88bcb378d8a1330.
>>>>>>
>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]:
>>>>>> java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey
>>>>>>
>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>> sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown
>>>>>> Source)
>>>>>>
>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>>>>>
>>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at
>>>>>> java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079)
>>>>>>
>>>>>> Jun
>>>>>> .....
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any feedback would be highly appreciated...
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Christophe
>>>
>>
>>
>

Reply via email to