Thanks Till. `taskmanager.network.request-backoff.max` option helped in my
case. We tried this on 1.5.0 and jobs are running fine.
--
Thanks
Amit
On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote:
> Thanks! Till. I'll give a try on your suggestions and update the thread.
>
> On Wed, May 23, 2018
Thanks! Till. I'll give a try on your suggestions and update the thread.
On Wed, May 23, 2018 at 4:43 AM, Till Rohrmann wrote:
> Hi Amit,
>
> it looks as if the current cancellation cause is not the same as the
> initially reported cancellation cause. In the current case,
Hi Amit,
thanks for providing the logs, I'll look into it. We currently have a
suspicion of this being caused by
https://issues.apache.org/jira/browse/FLINK-9406 which we found by
looking over the surrounding code. The RC4 has been cancelled since we
see this as a release blocker.
To rule out
Also, please have a look at the other TaskManagers' logs, in particular
the one that is running the operator that was mentioned in the
exception. You should look out for the ID 98f5976716234236dc69fb0e82a0cc34.
Nico
PS: Flink logs files should compress quite nicely if they grow too big :)
On
Google Drive would be great.
Thanks!
On Thu, May 3, 2018 at 1:33 PM, Amit Jain wrote:
> Hi Stephan,
>
> Size of JM log file is 122 MB. Could you provide me other media to
> post the same? We can use Google Drive if that's fine with you.
>
> --
> Thanks,
> Amit
>
> On Thu,
Hi Stephan,
Size of JM log file is 122 MB. Could you provide me other media to
post the same? We can use Google Drive if that's fine with you.
--
Thanks,
Amit
On Thu, May 3, 2018 at 12:58 PM, Stephan Ewen wrote:
> Hi Amit!
>
> Thanks for sharing this, this looks like a
Hi Amit!
Thanks for sharing this, this looks like a regression with the network
stack changes.
The log you shared from the TaskManager gives some hint, but that exception
alone should not be a problem. That exception can occur under a race
between deployment of some tasks while the whole job is
Thanks! Fabian
I will try using the current release-1.5 branch and update this thread.
--
Thanks,
Amit
On Wed, May 2, 2018 at 3:42 PM, Fabian Hueske wrote:
> Hi Amit,
>
> We recently fixed a bug in the network stack that affected batch jobs
> (FLINK-9144).
> The fix was
Hi Amit,
We recently fixed a bug in the network stack that affected batch jobs
(FLINK-9144).
The fix was added after your commit.
Do you have a chance to build the current release-1.5 branch and check if
the fix also resolves your problem?
Otherwise it would be great if you could open a blocker
Cluster is running on commit 2af481a
On Sun, Apr 29, 2018 at 9:59 PM, Amit Jain wrote:
> Hi,
>
> We are running numbers of batch jobs in Flink 1.5 cluster and few of those
> are getting stuck at random. These jobs having the following failure after
> which operator status
Hi,
We are running numbers of batch jobs in Flink 1.5 cluster and few of those
are getting stuck at random. These jobs having the following failure after
which operator status changes to CANCELED and stuck to same.
Please find complete TM's log at
11 matches
Mail list logo