Hi Subramanyam,

1.7.3 is not released yet. You need cherrypick these fixes if they really need 
them.

Regards,
Dian

> 在 2019年9月25日,上午12:08,Zhu Zhu <reed...@gmail.com> 写道:
> 
> Hi Subramanyam, 
> 
> I checked the commits.
> There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 
> contain both of them.
> 
> Thanks,
> Zhu Zhu
> 
> Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午11:02写道:
> Hi Zhu,
> 
>  
> 
> We also use FlinkKafkaProducer(011), hence I felt this fix would also be 
> needed for us.
> 
>  
> 
> I agree that the fix for the issue I had originally mentioned would not be 
> fixed by this, but I felt that I should be consuming this fix also.
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Zhu Zhu [mailto:reed...@gmail.com <mailto:reed...@gmail.com>] 
> Sent: Tuesday, September 24, 2019 6:13 PM
> To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>>
> Cc: Dian Fu <dian0511...@gmail.com <mailto:dian0511...@gmail.com>>; 
> user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
> 
>  
> 
> Hi Subramanyam,
> 
>  
> 
> I think you do not need the fix in FLINK-10455 which is for Kafka only. It's 
> just a similar issue as you met.
> 
> As you said, we need to make sure that the operator/UDF spawned threads are 
> stopped in the close() method. In this way, we can avoid the thread to throw 
> NoClassDefFoundError due to the class loader gets closed.
> 
>  
> 
> Thanks,
> 
> Zhu Zhu
> 
>  
> 
>  
> 
> Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年9月24日周二 下午8:07写道:
> 
> Hi,
> 
>  
> 
> Thank you.
> 
> I think the takeaway for us is that we need to make sure that the threads are 
> stopped in the close() method.
> 
>  
> 
> With regard to FLINK-10455, I see that the fix versions say : 1.5.6, 1.7.0, 
> 1.7.3, 1.8.1, 1.9.0
> 
>  
> 
> However, I’m unable to find 1.7.3 in the downloads 
> page(https://flink.apache.org/downloads.html 
> <https://flink.apache.org/downloads.html>). Is it yet to be released, or 
> perhaps I am not looking in the right place ?
> 
> We’re currently using 1.7.2. Could you please let me know what is the minimal 
> upgrade for me to consume the fix for FLINK-10455 ?
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Dian Fu [mailto:dian0511...@gmail.com <mailto:dian0511...@gmail.com>] 
> Sent: Monday, September 23, 2019 1:54 PM
> To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>>
> Cc: Zhu Zhu <reed...@gmail.com <mailto:reed...@gmail.com>>; 
> user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
> 
>  
> 
> Hi Subbu,
> 
>  
> 
> The issue you encountered is very similar to the issue which has been fixed 
> in FLINK-10455 [1]. Could you check if that fix could solve your problem? The 
> root cause for that issue is that the method close() has not closed all 
> things. After the method "close()" is called, the classloader 
> (URLClassloader) will be closed. If there is thread still running after 
> "close()" method is called, it may access the classes in user provided jars. 
> However, as the URLClassloader has already been closed, NoClassDefFoundError 
> will be thrown.
> 
>  
> 
> Regards,
> 
> Dian
> 
>  
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10455 
> <https://issues.apache.org/jira/browse/FLINK-10455>
>  
> 
> 在 2019年9月23日,下午2:50,Subramanyam Ramanathan 
> <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>> 写道:
> 
>  
> 
> Hi,
> 
>  
> 
> I was able to simulate the issue again and understand the cause a little 
> better.
> 
>  
> 
> The issue occurs when :
> 
> -        One of the RichMapFunction transformations uses a third party 
> library in the open() method that spawns a thread.
> 
> -        The thread doesn’t get properly closed in the close() method.
> 
> -        Once the job starts failing, we start seeing a NoClassDefFound error 
> from that thread.
> 
>  
> 
> I understand that cleanup should be done in the close() method. However, just 
> wanted to know, do we have some kind of a configuration setting  which would 
> help us clean up such threads ? 
> 
> I can attach the code if required.
> 
>  
> 
> Thanks,
> 
> Subbu
> 
>  
> 
> From: Zhu Zhu [mailto:reed...@gmail.com <mailto:reed...@gmail.com>] 
> Sent: Friday, August 9, 2019 7:43 AM
> To: Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>>
> Cc: user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: NoClassDefFoundError in failing-restarting job that uses url 
> classloader
> 
>  
> 
> Hi Subramanyam,
> 
>  
> 
> Could you share more information? including:
> 
> 1. the URL pattern
> 
> 2. the detailed exception and the log around it
> 
> 3. the cluster the job is running on, e.g. standalone, yarn, k8s
> 
> 4. it's session mode or per job mode
> 
>  
> 
> This information would be helpful to identify the failure cause.
> 
>  
> 
> Thanks,
> 
> Zhu Zhu
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com 
> <mailto:subramanyam.ramanat...@microfocus.com>> 于2019年8月9日周五 上午1:45写道:
> 
>  
> 
> Hello,
> 
>  
> 
> I'm currently using flink 1.7.2.
> 
>  
> 
> I'm trying to run a job that's submitted programmatically using the 
> ClusterClient API.
> 
>                public JobSubmissionResult run(PackagedProgram prog, int 
> parallelism)
> 
>  
> 
>  
> 
> The job makes use of some jars which I add to the packaged program through 
> the Packaged constructor, along with the Jar file.
> 
>    public PackagedProgram(File jarFile, List<URL> classpaths, String... args)
> 
> Normally, This works perfectly and the job runs fine.
> 
>  
> 
> However, if there's an error in the job, and the job goes into failing state 
> and when it's continously  trying to restart the job for an hour or so, I 
> notice a NoClassDefFoundError for some classes in the jars that I load using 
> the URL class loader and the job never recovers after that, even if the root 
> cause of the issue was fixed (I had a kafka source/sink in my job, and kafka 
> was down temporarily, and was brought up after that).
> 
> The jar is still available at the path referenced by the url classloader and 
> is not tampered with.
> 
>  
> 
> Could anyone please give me some pointers with regard to the reason why this 
> could happen/what I could be missing here/how can I debug further ?
> 
>  
> 
> thanks
> 
> Subbu
> 
>  
> 

Reply via email to