Thanks! I did not see the znode and thus did not paste the ls...anywaz will get you the full JM log ASAP....
On Thu, Jun 28, 2018, 5:35 PM Gary Yao <g...@data-artisans.com> wrote: > Hi Vishal, > > The znode /flink_test/da_15/leader/rest_server_lock should exist as long > as your > Flink 1.5 cluster is running. In 1.4 this znode will not be created. Are > you > sure that the znode does not exist? Unfortunately you only attached the > output > of "ls /flink_test/da_15". > > Can you share the complete JobManager log files from a cluster that is > (re-)starting? > > Best, > Gary > > On Thu, Jun 28, 2018 at 4:10 PM, Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> I am not seeing rest_server_lock. Is it transient ( ephemeral znode ) >> for the duration of the cli command ? >> >> >> [zk: localhost:2181(CONNECTED) 2] ls /flink_test/da_15 >> >> [jobgraphs, leader, checkpoints, leaderlatch, checkpoint-counter] >> >> >> The logs say >> >> 2018-06-28 14:02:56 INFO ZooKeeperLeaderRetrievalService:100 - Starting >> ZooKeeperLeaderRetrievalService /leader/rest_server_lock. >> >> 2018-06-28 14:02:56 INFO ZooKeeperLeaderRetrievalService:100 - Starting >> ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. >> >> Is this a relative path, >> >> high-availability.zookeeper.path.root: /flink_test >> >> high-availability.cluster-id: /da_15 >> >> >> I do not see /leader/rest_server_lock both during the duration of the >> cli run ( or before or after ). >> >> I am a little stumped.... I do not see the above logs on 1.4 so am not >> sure whether /leader/rest_server_lock is the new code... >> >> >> On Thu, Jun 28, 2018 at 3:30 AM, Christophe Jolif <cjo...@gmail.com> >> wrote: >> >>> Chesnay, >>> >>> Do you have rough idea of the 1.5.1 timeline? >>> >>> Thanks, >>> -- >>> Christophe >>> >>> On Mon, Jun 25, 2018 at 4:22 PM, Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> The watermark issue is know and will be fixed in 1.5.1 >>>> >>>> >>>> On 25.06.2018 15:03, Vishal Santoshi wrote: >>>> >>>> Thank you.... >>>> >>>> One addition >>>> >>>> I do not see WM info on the UI ( Attached ) >>>> >>>> Is this a know issue. The same pipe on our production has the WM ( In >>>> fact never had an issue with Watermarks not appearing ) . Am I missing >>>> something ? >>>> >>>> On Mon, Jun 25, 2018 at 4:15 AM, Fabian Hueske <fhue...@gmail.com> >>>> wrote: >>>> >>>>> Hi Vishal, >>>>> >>>>> 1. I don't think a rolling update is possible. Flink 1.5.0 changed the >>>>> process orchestration and how they communicate. IMO, the way to go is to >>>>> start a Flink 1.5.0 cluster, take a savepoint on the running job, start >>>>> from the savepoint on the new cluster and shut the old job down. >>>>> 2. Savepoints should be compatible. >>>>> 3. You can keep the slot configuration as before. >>>>> 4. As I said before, mixing 1.5 and 1.4 processes does not work (or at >>>>> least, it was not considered a design goal and nobody paid attention that >>>>> it is possible). >>>>> >>>>> Best, Fabian >>>>> >>>>> >>>>> 2018-06-23 13:38 GMT+02:00 Vishal Santoshi <vishal.santo...@gmail.com> >>>>> : >>>>> >>>>>> >>>>>> 1. >>>>>> Can or has any one done a rolling upgrade from 1.4 to 1.5 ? I am >>>>>> not sure we can. It seems that JM cannot recover jobs with this exception >>>>>> >>>>>> Caused by: java.io.InvalidClassException: >>>>>> org.apache.flink.runtime.jobgraph.tasks.CheckpointCoordinatorConfiguration; >>>>>> local class incompatible: stream classdesc serialVersionUID = >>>>>> -647384516034982626, local class serialVersionUID = 2 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2. >>>>>> Does SP on 1.4, resume on 1.5 ( pretty basic but no harm asking ) ? >>>>>> >>>>>> >>>>>> >>>>>> 3. >>>>>> >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html#update-configuration-for-reworked-job-deployment >>>>>> The taskmanager.numberOfTaskSlots: What would be the desired setting >>>>>> in a stand alone ( non mesos/yarn ) cluster ? >>>>>> >>>>>> >>>>>> 4. I suspend all jobs and establish 1.5 on the JM ( the TMs are still >>>>>> running with 1.4 ) . JM refuse to start with >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>>> 2018-06-23 11:34:23 ERROR JobManager:116 - Failed to recover job >>>>>> 454cd84a519f3b50e88bcb378d8a1330. >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: >>>>>> java.lang.InstantiationError: org.apache.flink.runtime.blob.BlobKey >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> sun.reflect.GeneratedSerializationConstructorAccessor51.newInstance(Unknown >>>>>> Source) >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:423) >>>>>> >>>>>> Jun 23 07:34:23 flink-ad21ac07.bf2.tumblr.net docker[3395]: at >>>>>> java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1079) >>>>>> >>>>>> Jun >>>>>> ..... >>>>>> >>>>>> >>>>>> >>>>>> Any feedback would be highly appreciated... >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >>> -- >>> Christophe >>> >> >> >