Ratandeep Ratti created HIVE-11878:
--------------------------------------

             Summary: ClassNotFoundException can possibly  occur if multiple 
jars are registered in Hive
                 Key: HIVE-11878
                 URL: https://issues.apache.org/jira/browse/HIVE-11878
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti


When we register a jar on the Hive console. Hive creates a fresh URL 
classloader which includes the path of the current jar to be registered and all 
the jar paths of the parent classloader. The parent classlaoder is the current 
ThreadContextClassLoader. Once the URLClassloader is created Hive sets that as 
the current ThreadContextClassloader.

So if we register multiple jars in Hive, there will be multiple URLClassLoaders 
created, each classloader including the jars from its parent and the one extra 
jar to be registered. The last URLClassLoader created will end up as the 
current ThreadContextClassLoader. (See details: 
org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)

Now here's an example in which the above strategy can lead to a CNF exception.
We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
*c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
the URLClassLoader *u1* is created and also set as the 
ThreadContextClassLoader. We register *j2* next, the new URLClassLoader created 
will be *u2* with *u1* as parent and *u2* becomes the new 
ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
whereas *u1* only has paths to *j1* (For details see: 
org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).

Now when we register class *c1* under a temporary function in Hive, we load the 
class using {code} class.forName("c1", true, 
Thread.currentThread().getContextClassLoader()) {code} . The 
currentThreadContext class-loader is *u2*, and it has the path to the class 
*c1*, but note that Class-loaders work by delegating to parent class-loader 
first. In this case class *c1* will be found and *defined* by class-loader *u1*.

Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
initialize) is called in *c1*, which references the class *c2*, *c2* will not 
be found since the class-loader used to search for *c2* will be *u1* (Since the 
caller's class-loader is used to load a class)


I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to