This might be related to NIFI-5110. Since the StandardProcessScheduler is keeping the reference through a StandardProcessorNode in the scheduleStates map, the PutHDFS cannot be garbage collected and thus the InstanceClassloader is not able to be cleaned up.
Note: It looks like between 1.5.0 and 1.6.0, the StandardProcessScheduler change a little and scheduleStates was changed to lifecycleStates. -Dann On Fri, Apr 27, 2018 at 8:48 AM Dann <[email protected]> wrote: > The behavior is the same with both 1.2.0 and 1.5.0. > > If the instance class loading is needed, there is something keeping that > instance classloader from being garbage collected. Removing instance class > loading (if possible) would fix the problem. If instance class loading is > needed, then finding the reason that classloader can't be cleaned up would > be the fix. > > > On Fri, Apr 27, 2018 at 8:38 AM Bryan Bende <[email protected]> wrote: > >> Is the behavior the same in 1.5.0, or only in 1.2.0? >> >> We can't undo the instance class loading for Hadoop processors unless >> we undo the static usage of UserGroupInformation. >> >> I think it is possible to go back to the non-static usage since we the >> kerberos issues were actually resolved by the useSubjectCredsOnly >> property, but it would just take a bit of work and testing and not as >> easy as just removing the instance class loading annotation. >> >> >> On Fri, Apr 27, 2018 at 10:25 AM, Joe Witt <[email protected]> wrote: >> > Dann >> > >> > The hadoop processors as of a version around 1.2.0 switched to >> > 'instance class loading' meaning that while we always have components >> > from various nars in their own classloader in the case of Hadoop >> > processors we went a step further and we create a new classloader for >> > every single instance of the component. This was done to overcome >> > what appears to be problematic usage of static values in the hadoop >> > client code. However, it is possible we've changed our usage enough >> > that these would not create problems for us any longer and we could >> > consider going back to typical class loading. >> > >> > Would be good to hear others thoughts. >> > >> > Thanks >> > >> > On Fri, Apr 27, 2018 at 10:20 AM, Dann <[email protected]> wrote: >> >> NiFi Versions: 1.2.0 and 1.5.0 >> >> Java Version: 1.8.0_162 >> >> >> >> It appears to me that when I add a PutHDFS processor to the canvas, the >> >> hadoop classes are loaded along with dependencies (normal/desired >> behavior). >> >> Then if I delete the PutHDFS processor, the garbage collection is not >> able >> >> to unload any of the classes that were loaded. This is the same >> behavior >> >> with every PutHDFS processor that is added and then deleted. The >> results in >> >> the slow degradation of NiFi over time with the way we use NiFi. >> >> >> >> Our usage of NiFi does not expose the NiFi interface for those that are >> >> designing data flows. We have a web interface that allows the user to >> >> define the data they need and our application will build data flows >> for the >> >> user. As we upgrade our application we make changes that require us to >> >> remove the old data flow and rebuild it with the new changes. Over >> time, we >> >> add and remove a lot of processors using the NiFi API and some of those >> >> processors are HDFS related. >> >> >> >> To me, there seems to be a problem with the Hadoop dependency loading >> and >> >> processor implementation that doesn't allow the JVM to unload those >> classes >> >> when they are no longer needed. >> >> >> >> Thanks, >> >> >> >> Dann >> >
