Can you please further provide the execution plan via env.getExecutionPlan()
On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov <timur.fairu...@gmail.com> wrote: > Hello Robert, > > I observed progress for 2 hours(meaning numbers change on dashboard), and > then I waited for 2 hours more. I'm sure it had to spill at some point, but > I figured 2h is enough time. > > Thanks, > Timur > > On Apr 26, 2016 1:35 AM, "Robert Metzger" <rmetz...@apache.org> wrote: >> >> Hi Timur, >> >> thank you for sharing the source code of your job. That is helpful! >> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much >> more IO heavy with the larger input data because all the joins start >> spilling? >> Our monitoring, in particular for batch jobs is really not very advanced.. >> If we had some monitoring showing the spill status, we would maybe see that >> the job is still running. >> >> How long did you wait until you declared the job hanging? >> >> Regards, >> Robert >> >> >> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <u...@apache.org> wrote: >>> >>> No. >>> >>> If you run on YARN, the YARN logs are the relevant ones for the >>> JobManager and TaskManager. The client log submitting the job should >>> be found in /log. >>> >>> – Ufuk >>> >>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov >>> <timur.fairu...@gmail.com> wrote: >>> > I will do it my tomorrow. Logs don't show anything unusual. Are there >>> > any >>> > logs besides what's in flink/log and yarn container logs? >>> > >>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <u...@apache.org> wrote: >>> > >>> > Hey Timur, >>> > >>> > is it possible to connect to the VMs and get stack traces of the Flink >>> > processes as well? >>> > >>> > We can first have a look at the logs, but the stack traces will be >>> > helpful if we can't figure out what the issue is. >>> > >>> > – Ufuk >>> > >>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <trohrm...@apache.org> >>> > wrote: >>> >> Could you share the logs with us, Timur? That would be very helpful. >>> >> >>> >> Cheers, >>> >> Till >>> >> >>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <timur.fairu...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hello, >>> >>> >>> >>> Now I'm at the stage where my job seem to completely hang. Source >>> >>> code is >>> >>> attached (it won't compile but I think gives a very good idea of what >>> >>> happens). Unfortunately I can't provide the datasets. Most of them >>> >>> are >>> >>> about >>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB >>> >>> memory >>> >>> for each. >>> >>> >>> >>> It was working for smaller input sizes. Any idea on what I can do >>> >>> differently is appreciated. >>> >>> >>> >>> Thans, >>> >>> Timur >> >> >