Hello Robert, I observed progress for 2 hours(meaning numbers change on dashboard), and then I waited for 2 hours more. I'm sure it had to spill at some point, but I figured 2h is enough time.
Thanks, Timur On Apr 26, 2016 1:35 AM, "Robert Metzger" <rmetz...@apache.org> wrote: > Hi Timur, > > thank you for sharing the source code of your job. That is helpful! > Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much > more IO heavy with the larger input data because all the joins start > spilling? > Our monitoring, in particular for batch jobs is really not very advanced.. > If we had some monitoring showing the spill status, we would maybe see that > the job is still running. > > How long did you wait until you declared the job hanging? > > Regards, > Robert > > > On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <u...@apache.org> wrote: > >> No. >> >> If you run on YARN, the YARN logs are the relevant ones for the >> JobManager and TaskManager. The client log submitting the job should >> be found in /log. >> >> – Ufuk >> >> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov >> <timur.fairu...@gmail.com> wrote: >> > I will do it my tomorrow. Logs don't show anything unusual. Are there >> any >> > logs besides what's in flink/log and yarn container logs? >> > >> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <u...@apache.org> wrote: >> > >> > Hey Timur, >> > >> > is it possible to connect to the VMs and get stack traces of the Flink >> > processes as well? >> > >> > We can first have a look at the logs, but the stack traces will be >> > helpful if we can't figure out what the issue is. >> > >> > – Ufuk >> > >> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> Could you share the logs with us, Timur? That would be very helpful. >> >> >> >> Cheers, >> >> Till >> >> >> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <timur.fairu...@gmail.com> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> Now I'm at the stage where my job seem to completely hang. Source >> code is >> >>> attached (it won't compile but I think gives a very good idea of what >> >>> happens). Unfortunately I can't provide the datasets. Most of them are >> >>> about >> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB >> >>> memory >> >>> for each. >> >>> >> >>> It was working for smaller input sizes. Any idea on what I can do >> >>> differently is appreciated. >> >>> >> >>> Thans, >> >>> Timur >> > >