Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-25 Thread Zhu Zhu
Yes. 1.8.2 contains all commits in 1.8.1.

Subramanyam Ramanathan 
于2019年9月25日周三 下午5:03写道:

> Hi Zhu,
>
>
>
> Thanks a lot !
>
> Since 1.8.2 is also available, would it be right to assume 1.8.2 would
> also contain the fix ?
>
>
>
> Thanks,
>
> Subbu
>
>
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com]
> *Sent:* Tuesday, September 24, 2019 9:39 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Dian Fu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> I checked the commits.
>
> There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0
> contain both of them.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
> Subramanyam Ramanathan  于2019年9月24
> 日周二 下午11:02写道:
>
> Hi Zhu,
>
>
>
> We also use FlinkKafkaProducer(011), hence I felt this fix would also be
> needed for us.
>
>
>
> I agree that the fix for the issue I had originally mentioned would not be
> fixed by this, but I felt that I should be consuming this fix also.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com]
> *Sent:* Tuesday, September 24, 2019 6:13 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Dian Fu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> I think you do not need the fix in FLINK-10455 which is for Kafka only.
> It's just a similar issue as you met.
>
> As you said, we need to make sure that the operator/UDF spawned threads
> are stopped in the close() method. In this way, we can avoid the thread to
> throw NoClassDefFoundError due to the class loader gets closed.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
>
>
> Subramanyam Ramanathan  于2019年9月24
> 日周二 下午8:07写道:
>
> Hi,
>
>
>
> Thank you.
>
> I think the takeaway for us is that we need to make sure that the threads
> are stopped in the close() method.
>
>
>
> With regard to FLINK-10455, I see that the fix versions say : 1.5.6,
> 1.7.0, 1.7.3, 1.8.1, 1.9.0
>
>
>
> However, I’m unable to find 1.7.3 in the downloads page(
> https://flink.apache.org/downloads.html). Is it yet to be released, or
> perhaps I am not looking in the right place ?
>
> We’re currently using 1.7.2. Could you please let me know what is the
> minimal upgrade for me to consume the fix for FLINK-10455 ?
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Dian Fu [mailto:dian0511...@gmail.com]
> *Sent:* Monday, September 23, 2019 1:54 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Zhu Zhu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subbu,
>
>
>
> The issue you encountered is very similar to the issue which has been
> fixed in FLINK-10455 [1]. Could you check if that fix could solve your
> problem? The root cause for that issue is that the method close() has not
> closed all things. After the method "close()" is called, the classloader
> (URLClassloader) will be closed. If there is thread still running after
> "close()" method is called, it may access the classes in user provided
> jars. However, as the URLClassloader has already been closed,
> NoClassDefFoundError will be thrown.
>
>
>
> Regards,
>
> Dian
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-10455
>
>
>
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan <
> subramanyam.ramanat...@microfocus.com> 写道:
>
>
>
> Hi,
>
>
>
> I was able to simulate the issue again and understand the cause a little
> better.
>
>
>
> The issue occurs when :
>
> -One of the RichMapFunction transformations uses a third party
> library in the open() method that spawns a thread.
>
> -The thread doesn’t get properly closed in the close() method.
>
> -        Once the job starts failing, we start seeing a NoClassDefFound
> error from that thread.
>
>
>
> I understand that cleanup should be done in the close() method. However,
> just wanted to know, do we have some kind of a configuration setting  which
> would help us clean up such threads ?
>
> I can attach the code if required.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com ]
> *Sent:* Friday, August 9, 2019 7:43 AM
> *To:* Subramanyam Ramanathan 
> *Cc:* user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
&

RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-25 Thread Subramanyam Ramanathan
Hi Zhu,

Thanks a lot !
Since 1.8.2 is also available, would it be right to assume 1.8.2 would also 
contain the fix ?

Thanks,
Subbu


From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Tuesday, September 24, 2019 9:39 PM
To: Subramanyam Ramanathan 
Cc: Dian Fu ; user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

I checked the commits.
There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 contain 
both of them.

Thanks,
Zhu Zhu

Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年9月24日周二 下午11:02写道:
Hi Zhu,

We also use FlinkKafkaProducer(011), hence I felt this fix would also be needed 
for us.

I agree that the fix for the issue I had originally mentioned would not be 
fixed by this, but I felt that I should be consuming this fix also.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com<mailto:reed...@gmail.com>]
Sent: Tuesday, September 24, 2019 6:13 PM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: Dian Fu mailto:dian0511...@gmail.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

I think you do not need the fix in FLINK-10455 which is for Kafka only. It's 
just a similar issue as you met.
As you said, we need to make sure that the operator/UDF spawned threads are 
stopped in the close() method. In this way, we can avoid the thread to throw 
NoClassDefFoundError due to the class loader gets closed.

Thanks,
Zhu Zhu


Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年9月24日周二 下午8:07写道:
Hi,

Thank you.
I think the takeaway for us is that we need to make sure that the threads are 
stopped in the close() method.

With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, 
1.7.3, 1.8.1, 1.9.0

However, I’m unable to find 1.7.3 in the downloads 
page(https://flink.apache.org/downloads.html). Is it yet to be released, or 
perhaps I am not looking in the right place ?
We’re currently using 1.7.2. Could you please let me know what is the minimal 
upgrade for me to consume the fix for FLINK-10455 ?

Thanks,
Subbu

From: Dian Fu [mailto:dian0511...@gmail.com<mailto:dian0511...@gmail.com>]
Sent: Monday, September 23, 2019 1:54 PM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: Zhu Zhu mailto:reed...@gmail.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subbu,

The issue you encountered is very similar to the issue which has been fixed in 
FLINK-10455 [1]. Could you check if that fix could solve your problem? The root 
cause for that issue is that the method close() has not closed all things. 
After the method "close()" is called, the classloader (URLClassloader) will be 
closed. If there is thread still running after "close()" method is called, it 
may access the classes in user provided jars. However, as the URLClassloader 
has already been closed, NoClassDefFoundError will be thrown.

Regards,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-10455

在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 写道:

Hi,

I was able to simulate the issue again and understand the cause a little better.

The issue occurs when :
-One of the RichMapFunction transformations uses a third party library 
in the open() method that spawns a thread.
-The thread doesn’t get properly closed in the close() method.
-Once the job starts failing, we start seeing a NoClassDefFound error 
from that thread.

I understand that cleanup should be done in the close() method. However, just 
wanted to know, do we have some kind of a configuration setting  which would 
help us clean up such threads ?
I can attach the code if required.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年8月9日周五 上午1:45写道:

Hello,

I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the 
ClusterClient API.
   public JobSubmissionResult run(PackagedProg

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Dian Fu
Hi Subramanyam,

1.7.3 is not released yet. You need cherrypick these fixes if they really need 
them.

Regards,
Dian

> 在 2019年9月25日,上午12:08,Zhu Zhu  写道:
> 
> Hi Subramanyam, 
> 
> I checked the commits.
> There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 
> contain both of them.
> 
> Thanks,
> Zhu Zhu
> 
> Subramanyam Ramanathan  <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午11:02写道:
> Hi Zhu,
> 
>  
> 
> We also use FlinkKafkaProducer(011), hence I felt this fix would also be 
> needed for us.
> 
>  
> 
> I agree that the fix for the issue I had originally mentioned would not be 
> fixed by this, but I felt that I should be consuming this fix also.
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Zhu Zhu [mailto:reed...@gmail.com <mailto:reed...@gmail.com>] 
> Sent: Tuesday, September 24, 2019 6:13 PM
> To: Subramanyam Ramanathan  <mailto:subramanyam.ramanat...@microfocus.com>>
> Cc: Dian Fu mailto:dian0511...@gmail.com>>; 
> user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
> 
>  
> 
> Hi Subramanyam,
> 
>  
> 
> I think you do not need the fix in FLINK-10455 which is for Kafka only. It's 
> just a similar issue as you met.
> 
> As you said, we need to make sure that the operator/UDF spawned threads are 
> stopped in the close() method. In this way, we can avoid the thread to throw 
> NoClassDefFoundError due to the class loader gets closed.
> 
>  
> 
> Thanks,
> 
> Zhu Zhu
> 
>  
> 
>  
> 
> Subramanyam Ramanathan  <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午8:07写道:
> 
> Hi,
> 
>  
> 
> Thank you.
> 
> I think the takeaway for us is that we need to make sure that the threads are 
> stopped in the close() method.
> 
>  
> 
> With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, 
> 1.7.3, 1.8.1, 1.9.0
> 
>  
> 
> However, I’m unable to find 1.7.3 in the downloads 
> page(https://flink.apache.org/downloads.html 
> <https://flink.apache.org/downloads.html>). Is it yet to be released, or 
> perhaps I am not looking in the right place ?
> 
> We’re currently using 1.7.2. Could you please let me know what is the minimal 
> upgrade for me to consume the fix for FLINK-10455 ?
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Dian Fu [mailto:dian0511...@gmail.com <mailto:dian0511...@gmail.com>] 
> Sent: Monday, September 23, 2019 1:54 PM
> To: Subramanyam Ramanathan  <mailto:subramanyam.ramanat...@microfocus.com>>
> Cc: Zhu Zhu mailto:reed...@gmail.com>>; 
> user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
> 
>  
> 
> Hi Subbu,
> 
>  
> 
> The issue you encountered is very similar to the issue which has been fixed 
> in FLINK-10455 [1]. Could you check if that fix could solve your problem? The 
> root cause for that issue is that the method close() has not closed all 
> things. After the method "close()" is called, the classloader 
> (URLClassloader) will be closed. If there is thread still running after 
> "close()" method is called, it may access the classes in user provided jars. 
> However, as the URLClassloader has already been closed, NoClassDefFoundError 
> will be thrown.
> 
>  
> 
> Regards,
> 
> Dian
> 
>  
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10455 
> <https://issues.apache.org/jira/browse/FLINK-10455>
>  
> 
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
>  <mailto:subramanyam.ramanat...@microfocus.com>> 写道:
> 
>  
> 
> Hi,
> 
>  
> 
> I was able to simulate the issue again and understand the cause a little 
> better.
> 
>  
> 
> The issue occurs when :
> 
> -One of the RichMapFunction transformations uses a third party 
> library in the open() method that spawns a thread.
> 
> -The thread doesn’t get properly closed in the close() method.
> 
> -Once the job starts failing, we start seeing a NoClassDefFound error 
> from that thread.
> 
>  
> 
> I understand that cleanup should be done in the close() method. However, just 
> wanted to know, do we have some kind of a configuration setting  which would 
> help us clean up such threads ? 
> 
> I can attach the code if required.
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Zhu Zhu [mailto:reed...@gmail.com <mailto:re

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Zhu Zhu
Hi Subramanyam,

I checked the commits.
There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0
contain both of them.

Thanks,
Zhu Zhu

Subramanyam Ramanathan 
于2019年9月24日周二 下午11:02写道:

> Hi Zhu,
>
>
>
> We also use FlinkKafkaProducer(011), hence I felt this fix would also be
> needed for us.
>
>
>
> I agree that the fix for the issue I had originally mentioned would not be
> fixed by this, but I felt that I should be consuming this fix also.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com]
> *Sent:* Tuesday, September 24, 2019 6:13 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Dian Fu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> I think you do not need the fix in FLINK-10455 which is for Kafka only.
> It's just a similar issue as you met.
>
> As you said, we need to make sure that the operator/UDF spawned threads
> are stopped in the close() method. In this way, we can avoid the thread to
> throw NoClassDefFoundError due to the class loader gets closed.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
>
>
> Subramanyam Ramanathan  于2019年9月24
> 日周二 下午8:07写道:
>
> Hi,
>
>
>
> Thank you.
>
> I think the takeaway for us is that we need to make sure that the threads
> are stopped in the close() method.
>
>
>
> With regard to FLINK-10455, I see that the fix versions say : 1.5.6,
> 1.7.0, 1.7.3, 1.8.1, 1.9.0
>
>
>
> However, I’m unable to find 1.7.3 in the downloads page(
> https://flink.apache.org/downloads.html). Is it yet to be released, or
> perhaps I am not looking in the right place ?
>
> We’re currently using 1.7.2. Could you please let me know what is the
> minimal upgrade for me to consume the fix for FLINK-10455 ?
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Dian Fu [mailto:dian0511...@gmail.com]
> *Sent:* Monday, September 23, 2019 1:54 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Zhu Zhu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subbu,
>
>
>
> The issue you encountered is very similar to the issue which has been
> fixed in FLINK-10455 [1]. Could you check if that fix could solve your
> problem? The root cause for that issue is that the method close() has not
> closed all things. After the method "close()" is called, the classloader
> (URLClassloader) will be closed. If there is thread still running after
> "close()" method is called, it may access the classes in user provided
> jars. However, as the URLClassloader has already been closed,
> NoClassDefFoundError will be thrown.
>
>
>
> Regards,
>
> Dian
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-10455
>
>
>
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan <
> subramanyam.ramanat...@microfocus.com> 写道:
>
>
>
> Hi,
>
>
>
> I was able to simulate the issue again and understand the cause a little
> better.
>
>
>
> The issue occurs when :
>
> -One of the RichMapFunction transformations uses a third party
> library in the open() method that spawns a thread.
>
> -The thread doesn’t get properly closed in the close() method.
>
> -Once the job starts failing, we start seeing a NoClassDefFound
> error from that thread.
>
>
>
> I understand that cleanup should be done in the close() method. However,
> just wanted to know, do we have some kind of a configuration setting  which
> would help us clean up such threads ?
>
> I can attach the code if required.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com ]
> *Sent:* Friday, August 9, 2019 7:43 AM
> *To:* Subramanyam Ramanathan 
> *Cc:* user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> Could you share more information? including:
>
> 1. the URL pattern
>
> 2. the detailed exception and the log around it
>
> 3. the cluster the job is running on, e.g. standalone, yarn, k8s
>
> 4. it's session mode or per job mode
>
>
>
> This information would be helpful to identify the failure cause.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Subramanyam Ramanathan  于2019年8月9
> 日周五 上午1:45写道:
>
>
>
> Hello,
>
>
>
> I'm currently using flink 1.7.2.
>
>
&

RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Subramanyam Ramanathan
Hi Zhu,

We also use FlinkKafkaProducer(011), hence I felt this fix would also be needed 
for us.

I agree that the fix for the issue I had originally mentioned would not be 
fixed by this, but I felt that I should be consuming this fix also.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Tuesday, September 24, 2019 6:13 PM
To: Subramanyam Ramanathan 
Cc: Dian Fu ; user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

I think you do not need the fix in FLINK-10455 which is for Kafka only. It's 
just a similar issue as you met.
As you said, we need to make sure that the operator/UDF spawned threads are 
stopped in the close() method. In this way, we can avoid the thread to throw 
NoClassDefFoundError due to the class loader gets closed.

Thanks,
Zhu Zhu


Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年9月24日周二 下午8:07写道:
Hi,

Thank you.
I think the takeaway for us is that we need to make sure that the threads are 
stopped in the close() method.

With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, 
1.7.3, 1.8.1, 1.9.0

However, I’m unable to find 1.7.3 in the downloads 
page(https://flink.apache.org/downloads.html). Is it yet to be released, or 
perhaps I am not looking in the right place ?
We’re currently using 1.7.2. Could you please let me know what is the minimal 
upgrade for me to consume the fix for FLINK-10455 ?

Thanks,
Subbu

From: Dian Fu [mailto:dian0511...@gmail.com<mailto:dian0511...@gmail.com>]
Sent: Monday, September 23, 2019 1:54 PM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: Zhu Zhu mailto:reed...@gmail.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subbu,

The issue you encountered is very similar to the issue which has been fixed in 
FLINK-10455 [1]. Could you check if that fix could solve your problem? The root 
cause for that issue is that the method close() has not closed all things. 
After the method "close()" is called, the classloader (URLClassloader) will be 
closed. If there is thread still running after "close()" method is called, it 
may access the classes in user provided jars. However, as the URLClassloader 
has already been closed, NoClassDefFoundError will be thrown.

Regards,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-10455

在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 写道:

Hi,

I was able to simulate the issue again and understand the cause a little better.

The issue occurs when :
-One of the RichMapFunction transformations uses a third party library 
in the open() method that spawns a thread.
-The thread doesn’t get properly closed in the close() method.
-Once the job starts failing, we start seeing a NoClassDefFound error 
from that thread.

I understand that cleanup should be done in the close() method. However, just 
wanted to know, do we have some kind of a configuration setting  which would 
help us clean up such threads ?
I can attach the code if required.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年8月9日周五 上午1:45写道:

Hello,

I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the 
ClusterClient API.
   public JobSubmissionResult run(PackagedProgram prog, int 
parallelism)


The job makes use of some jars which I add to the packaged program through the 
Packaged constructor, along with the Jar file.
   public PackagedProgram(File jarFile, List classpaths, String... args)
Normally, This works perfectly and the job runs fine.

However, if there's an error in the job, and the job goes into failing state 
and when it's continously  trying to restart the job for an hour or so, I 
notice a NoClassDefFoundError for some classes in the jars that I load using 
the URL class loader and the job never recovers after that, even if the root 
cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
was down temporarily, and was brought up after that).
The jar is still available at the path referenced by the ur

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Zhu Zhu
Hi Subramanyam,

I think you do not need the fix in FLINK-10455 which is for Kafka only.
It's just a similar issue as you met.
As you said, we need to make sure that the operator/UDF spawned threads are
stopped in the close() method. In this way, we can avoid the thread to
throw NoClassDefFoundError due to the class loader gets closed.

Thanks,
Zhu Zhu


Subramanyam Ramanathan 
于2019年9月24日周二 下午8:07写道:

> Hi,
>
>
>
> Thank you.
>
> I think the takeaway for us is that we need to make sure that the threads
> are stopped in the close() method.
>
>
>
> With regard to FLINK-10455, I see that the fix versions say : 1.5.6,
> 1.7.0, 1.7.3, 1.8.1, 1.9.0
>
>
>
> However, I’m unable to find 1.7.3 in the downloads page(
> https://flink.apache.org/downloads.html). Is it yet to be released, or
> perhaps I am not looking in the right place ?
>
> We’re currently using 1.7.2. Could you please let me know what is the
> minimal upgrade for me to consume the fix for FLINK-10455 ?
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Dian Fu [mailto:dian0511...@gmail.com]
> *Sent:* Monday, September 23, 2019 1:54 PM
> *To:* Subramanyam Ramanathan 
> *Cc:* Zhu Zhu ; user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subbu,
>
>
>
> The issue you encountered is very similar to the issue which has been
> fixed in FLINK-10455 [1]. Could you check if that fix could solve your
> problem? The root cause for that issue is that the method close() has not
> closed all things. After the method "close()" is called, the classloader
> (URLClassloader) will be closed. If there is thread still running after
> "close()" method is called, it may access the classes in user provided
> jars. However, as the URLClassloader has already been closed,
> NoClassDefFoundError will be thrown.
>
>
>
> Regards,
>
> Dian
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-10455
>
>
>
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan <
> subramanyam.ramanat...@microfocus.com> 写道:
>
>
>
> Hi,
>
>
>
> I was able to simulate the issue again and understand the cause a little
> better.
>
>
>
> The issue occurs when :
>
> -One of the RichMapFunction transformations uses a third party
> library in the open() method that spawns a thread.
>
> -The thread doesn’t get properly closed in the close() method.
>
> -Once the job starts failing, we start seeing a NoClassDefFound
> error from that thread.
>
>
>
> I understand that cleanup should be done in the close() method. However,
> just wanted to know, do we have some kind of a configuration setting  which
> would help us clean up such threads ?
>
> I can attach the code if required.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com ]
> *Sent:* Friday, August 9, 2019 7:43 AM
> *To:* Subramanyam Ramanathan 
> *Cc:* user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> Could you share more information? including:
>
> 1. the URL pattern
>
> 2. the detailed exception and the log around it
>
> 3. the cluster the job is running on, e.g. standalone, yarn, k8s
>
> 4. it's session mode or per job mode
>
>
>
> This information would be helpful to identify the failure cause.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Subramanyam Ramanathan  于2019年8月9
> 日周五 上午1:45写道:
>
>
>
> Hello,
>
>
>
> I'm currently using flink 1.7.2.
>
>
>
> I'm trying to run a job that's submitted programmatically using the
> ClusterClient API.
>
>public JobSubmissionResult run(PackagedProgram prog, int
> parallelism)
>
>
>
>
>
> The job makes use of some jars which I add to the packaged program through
> the Packaged constructor, along with the Jar file.
>
>public PackagedProgram(File jarFile, List classpaths, String...
> args)
>
> Normally, This works perfectly and the job runs fine.
>
>
>
> However, if there's an error in the job, and the job goes into failing
> state and when it's continously  trying to restart the job for an hour or
> so, I notice a NoClassDefFoundError for some classes in the jars that I
> load using the URL class loader and the job never recovers after that, even
> if the root cause of the issue was fixed (I had a kafka source/sink in my
> job, and kafka was down temporarily, and was brought up after that).
>
> The jar is still available at the path referenced by the url classloader
> and is not tampered with.
>
>
>
> Could anyone please give me some pointers with regard to the reason why
> this could happen/what I could be missing here/how can I debug further ?
>
>
>
> thanks
>
> Subbu
>
>
>


RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Subramanyam Ramanathan
Hi,

Thank you.
I think the takeaway for us is that we need to make sure that the threads are 
stopped in the close() method.

With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, 
1.7.3, 1.8.1, 1.9.0

However, I’m unable to find 1.7.3 in the downloads 
page(https://flink.apache.org/downloads.html). Is it yet to be released, or 
perhaps I am not looking in the right place ?
We’re currently using 1.7.2. Could you please let me know what is the minimal 
upgrade for me to consume the fix for FLINK-10455 ?

Thanks,
Subbu

From: Dian Fu [mailto:dian0511...@gmail.com]
Sent: Monday, September 23, 2019 1:54 PM
To: Subramanyam Ramanathan 
Cc: Zhu Zhu ; user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subbu,

The issue you encountered is very similar to the issue which has been fixed in 
FLINK-10455 [1]. Could you check if that fix could solve your problem? The root 
cause for that issue is that the method close() has not closed all things. 
After the method "close()" is called, the classloader (URLClassloader) will be 
closed. If there is thread still running after "close()" method is called, it 
may access the classes in user provided jars. However, as the URLClassloader 
has already been closed, NoClassDefFoundError will be thrown.

Regards,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-10455

在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 写道:

Hi,

I was able to simulate the issue again and understand the cause a little better.

The issue occurs when :
-One of the RichMapFunction transformations uses a third party library 
in the open() method that spawns a thread.
-The thread doesn’t get properly closed in the close() method.
-Once the job starts failing, we start seeing a NoClassDefFound error 
from that thread.

I understand that cleanup should be done in the close() method. However, just 
wanted to know, do we have some kind of a configuration setting  which would 
help us clean up such threads ?
I can attach the code if required.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年8月9日周五 上午1:45写道:

Hello,

I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the 
ClusterClient API.
   public JobSubmissionResult run(PackagedProgram prog, int 
parallelism)


The job makes use of some jars which I add to the packaged program through the 
Packaged constructor, along with the Jar file.
   public PackagedProgram(File jarFile, List classpaths, String... args)
Normally, This works perfectly and the job runs fine.

However, if there's an error in the job, and the job goes into failing state 
and when it's continously  trying to restart the job for an hour or so, I 
notice a NoClassDefFoundError for some classes in the jars that I load using 
the URL class loader and the job never recovers after that, even if the root 
cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
was down temporarily, and was brought up after that).
The jar is still available at the path referenced by the url classloader and is 
not tampered with.

Could anyone please give me some pointers with regard to the reason why this 
could happen/what I could be missing here/how can I debug further ?

thanks
Subbu



Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-23 Thread Dian Fu
Hi Subbu,

The issue you encountered is very similar to the issue which has been fixed in 
FLINK-10455 [1]. Could you check if that fix could solve your problem? The root 
cause for that issue is that the method close() has not closed all things. 
After the method "close()" is called, the classloader (URLClassloader) will be 
closed. If there is thread still running after "close()" method is called, it 
may access the classes in user provided jars. However, as the URLClassloader 
has already been closed, NoClassDefFoundError will be thrown.

Regards,
Dian

[1] https://issues.apache.org/jira/browse/FLINK-10455 
<https://issues.apache.org/jira/browse/FLINK-10455>
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
>  写道:
> 
> Hi,
>  
> I was able to simulate the issue again and understand the cause a little 
> better.
>  
> The issue occurs when :
> -One of the RichMapFunction transformations uses a third party 
> library in the open() method that spawns a thread.
> -The thread doesn’t get properly closed in the close() method.
> -Once the job starts failing, we start seeing a NoClassDefFound error 
> from that thread.
>  
> I understand that cleanup should be done in the close() method. However, just 
> wanted to know, do we have some kind of a configuration setting  which would 
> help us clean up such threads ? 
> I can attach the code if required.
>  
> Thanks,
> Subbu
>  
> From: Zhu Zhu [mailto:reed...@gmail.com] 
> Sent: Friday, August 9, 2019 7:43 AM
> To: Subramanyam Ramanathan 
> Cc: user@flink.apache.org
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
>  
> Hi Subramanyam,
>  
> Could you share more information? including:
> 1. the URL pattern
> 2. the detailed exception and the log around it
> 3. the cluster the job is running on, e.g. standalone, yarn, k8s
> 4. it's session mode or per job mode
>  
> This information would be helpful to identify the failure cause.
>  
> Thanks,
> Zhu Zhu
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> Subramanyam Ramanathan  <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年8月9日周五 上午1:45写道:
>  
> Hello,
>  
> I'm currently using flink 1.7.2.
>  
> I'm trying to run a job that's submitted programmatically using the 
> ClusterClient API.
>public JobSubmissionResult run(PackagedProgram prog, int 
> parallelism)
>  
>  
> The job makes use of some jars which I add to the packaged program through 
> the Packaged constructor, along with the Jar file.
>public PackagedProgram(File jarFile, List classpaths, String... args)
> Normally, This works perfectly and the job runs fine.
>  
> However, if there's an error in the job, and the job goes into failing state 
> and when it's continously  trying to restart the job for an hour or so, I 
> notice a NoClassDefFoundError for some classes in the jars that I load using 
> the URL class loader and the job never recovers after that, even if the root 
> cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
> was down temporarily, and was brought up after that).
> The jar is still available at the path referenced by the url classloader and 
> is not tampered with.
>  
> Could anyone please give me some pointers with regard to the reason why this 
> could happen/what I could be missing here/how can I debug further ?
>  
> thanks
> Subbu



RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-23 Thread Subramanyam Ramanathan
Hi,

I was able to simulate the issue again and understand the cause a little better.

The issue occurs when :

-One of the RichMapFunction transformations uses a third party library 
in the open() method that spawns a thread.

-The thread doesn’t get properly closed in the close() method.

-Once the job starts failing, we start seeing a NoClassDefFound error 
from that thread.

I understand that cleanup should be done in the close() method. However, just 
wanted to know, do we have some kind of a configuration setting  which would 
help us clean up such threads ?
I can attach the code if required.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan 
Cc: user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年8月9日周五 上午1:45写道:

Hello,

I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the 
ClusterClient API.
   public JobSubmissionResult run(PackagedProgram prog, int 
parallelism)


The job makes use of some jars which I add to the packaged program through the 
Packaged constructor, along with the Jar file.
   public PackagedProgram(File jarFile, List classpaths, String... args)
Normally, This works perfectly and the job runs fine.

However, if there's an error in the job, and the job goes into failing state 
and when it's continously  trying to restart the job for an hour or so, I 
notice a NoClassDefFoundError for some classes in the jars that I load using 
the URL class loader and the job never recovers after that, even if the root 
cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
was down temporarily, and was brought up after that).
The jar is still available at the path referenced by the url classloader and is 
not tampered with.

Could anyone please give me some pointers with regard to the reason why this 
could happen/what I could be missing here/how can I debug further ?

thanks
Subbu




Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-10 Thread Zhu Zhu
Hi Subramanyam,

I think the standalone per job mode does not invoke *PackagedProgram(File
jarFile, List classpaths, String... args)* to generate
*PackagedProgram*, and thus does not add extra classpaths to the job.

Regarding the NoClassDefFoundError, there is another possibility that the
class file exists but it has some static initialization process which may
fail. This can also lead to the class to not be loaded and cause
NoClassDefFoundError.

Thanks,
Zhu Zhu


Subramanyam Ramanathan 
于2019年8月10日周六 下午2:38写道:

> Hi.
>
>
>
> 1)  The url pattern example :
> file:///root/flink-test/lib/dependency.jar
>
> 2)  I’m trying to simulate the same issue on a separate flink
> installation with a sample job so that I can share the logs. (However so
> far I’ve been unable to simulate it. Though on our product setup it can be
> simulated quite frequently. )
>
> 3)  The job is running in standalone mode. We have separate k8s pods
> with our own images which incorporate the taskmanager and jobmanager for
> our product. A 3rd pod connects using k8s and submits the job
>
> 4)  Per job mode
>
>
>
> I’m trying to simulate the issue on a separate flink installation outside
> of our produce env. I’ll update as soon as I have results.
>
>
>
> Thanks,
>
> Subbu
>
>
>
> *From:* Zhu Zhu [mailto:reed...@gmail.com]
> *Sent:* Friday, August 9, 2019 7:43 AM
> *To:* Subramanyam Ramanathan 
> *Cc:* user@flink.apache.org
> *Subject:* Re: NoClassDefFoundError in failing-restarting job that uses
> url classloader
>
>
>
> Hi Subramanyam,
>
>
>
> Could you share more information? including:
>
> 1. the URL pattern
>
> 2. the detailed exception and the log around it
>
> 3. the cluster the job is running on, e.g. standalone, yarn, k8s
>
> 4. it's session mode or per job mode
>
>
>
> This information would be helpful to identify the failure cause.
>
>
>
> Thanks,
>
> Zhu Zhu
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Subramanyam Ramanathan  于2019年8月9
> 日周五 上午1:45写道:
>
>
>
> Hello,
>
>
>
> I'm currently using flink 1.7.2.
>
>
>
> I'm trying to run a job that's submitted programmatically using the
> ClusterClient API.
>
>public JobSubmissionResult run(PackagedProgram prog, int
> parallelism)
>
>
>
>
>
> The job makes use of some jars which I add to the packaged program through
> the Packaged constructor, along with the Jar file.
>
>public PackagedProgram(File jarFile, List classpaths, String...
> args)
>
> Normally, This works perfectly and the job runs fine.
>
>
>
> However, if there's an error in the job, and the job goes into failing
> state and when it's continously  trying to restart the job for an hour or
> so, I notice a NoClassDefFoundError for some classes in the jars that I
> load using the URL class loader and the job never recovers after that, even
> if the root cause of the issue was fixed (I had a kafka source/sink in my
> job, and kafka was down temporarily, and was brought up after that).
>
> The jar is still available at the path referenced by the url classloader
> and is not tampered with.
>
>
>
> Could anyone please give me some pointers with regard to the reason why
> this could happen/what I could be missing here/how can I debug further ?
>
>
>
> thanks
>
> Subbu
>
>
>
>
>
>


RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-10 Thread Subramanyam Ramanathan
Hi.


1)  The url pattern example : 
file:///root/flink-test/lib/dependency.jar

2)  I’m trying to simulate the same issue on a separate flink installation 
with a sample job so that I can share the logs. (However so far I’ve been 
unable to simulate it. Though on our product setup it can be simulated quite 
frequently. )

3)  The job is running in standalone mode. We have separate k8s pods with 
our own images which incorporate the taskmanager and jobmanager for our 
product. A 3rd pod connects using k8s and submits the job

4)  Per job mode

I’m trying to simulate the issue on a separate flink installation outside of 
our produce env. I’ll update as soon as I have results.

Thanks,
Subbu

From: Zhu Zhu [mailto:reed...@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan 
Cc: user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan 
mailto:subramanyam.ramanat...@microfocus.com>>
 于2019年8月9日周五 上午1:45写道:

Hello,

I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the 
ClusterClient API.
   public JobSubmissionResult run(PackagedProgram prog, int 
parallelism)


The job makes use of some jars which I add to the packaged program through the 
Packaged constructor, along with the Jar file.
   public PackagedProgram(File jarFile, List classpaths, String... args)
Normally, This works perfectly and the job runs fine.

However, if there's an error in the job, and the job goes into failing state 
and when it's continously  trying to restart the job for an hour or so, I 
notice a NoClassDefFoundError for some classes in the jars that I load using 
the URL class loader and the job never recovers after that, even if the root 
cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
was down temporarily, and was brought up after that).
The jar is still available at the path referenced by the url classloader and is 
not tampered with.

Could anyone please give me some pointers with regard to the reason why this 
could happen/what I could be missing here/how can I debug further ?

thanks
Subbu




Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-08 Thread Zhu Zhu
Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Thanks,
Zhu Zhu











Subramanyam Ramanathan  于2019年8月9日周五
上午1:45写道:

>
>
> Hello,
>
>
>
> I'm currently using flink 1.7.2.
>
>
>
> I'm trying to run a job that's submitted programmatically using the
> ClusterClient API.
>
>public JobSubmissionResult run(PackagedProgram prog, int
> parallelism)
>
>
>
>
>
> The job makes use of some jars which I add to the packaged program through
> the Packaged constructor, along with the Jar file.
>
>public PackagedProgram(File jarFile, List classpaths, String...
> args)
>
> Normally, This works perfectly and the job runs fine.
>
>
>
> However, if there's an error in the job, and the job goes into failing
> state and when it's continously  trying to restart the job for an hour or
> so, I notice a NoClassDefFoundError for some classes in the jars that I
> load using the URL class loader and the job never recovers after that, even
> if the root cause of the issue was fixed (I had a kafka source/sink in my
> job, and kafka was down temporarily, and was brought up after that).
>
> The jar is still available at the path referenced by the url classloader
> and is not tampered with.
>
>
>
> Could anyone please give me some pointers with regard to the reason why
> this could happen/what I could be missing here/how can I debug further ?
>
>
>
> thanks
>
> Subbu
>
>
>
>
>