This Tika issues confirms the problem. 

https://issues.apache.org/jira/browse/TIKA-412 

Jukka Zitting reports:

POI depends on dom4j that in turn depends on the xml-apis jar for some 
XML-related interfaces that are nowadays a part of the JRE. Normally having 
such an extra jar around doesn't harm anything as normal class loaders will 
always use the classes provided by the JRE. However some application servers 
like JBoss allow applications to override javax.* interfaces, which causes all 
sorts of trouble. Thus it's better if we exclude the xml-apis dependency from 
Tika.

This issue was fixed in Tika 0.8.

So upgrading the version of Tika in TikaAnnotator would definitely solve the 
problem for UIMA PEAR users.

Greg


----- Original Message ----- 
From: "Greg Holmberg" <[email protected]> 
To: [email protected] 
Cc: [email protected] 
Sent: Wednesday, September 26, 2012 6:11:55 PM 
Subject: Re: ClassLoader problems when using PEAR files 

Hi Marshall-- 


I did try that. What it told me is that DocumentBuilderFactory.newInstance() 
was able to find an implementation many times right up to the point that Tika 
tried within the PEAR analysis engine, when it couldn't find an implementation. 
Which I already knew :-) 

Before that point, it was able to find several different implementations, but 
mostly com.sun.org.apache.xerces.internal.jaxp.documentbuilderfactoryimpl (the 
platform fallback). Since this class exists in rt.jar (i.e. it's built into the 
JDK installation), I was perplexed about how the classloader could fail to find 
it. Especially when I even called 
ResourceManager.setExtensionClassPath(Thread.currentThread().getContextClassLoader(),
 ...). That should have allowed the UIMA class loader to fallback to the system 
class loader, which should be able to find classes in rt.jar. But it didn't. 

After extensive experimenting and googling (I hate to admit how many days I 
spent on this), I finally figured it out. The conditions are that one is using: 

* Java 1.6 or later (including 1.7) 
* UIMA Addons 2.3.1, specifically the TikaAnnotator and Tika 0.7. 
* PEAR Installer. 

As you know, when you use PEAR files (PackageInstaller), then 
UIMAFramework.produceAnalysisEngine() creates a new class loader in order to 
provide an insulated environment based on the classpath instructions in the 
PEAR's metadata/install.xml file. 

In my case, the PEAR file was built by maven, which I configured (using the 
"assembly" plug-in) to unpack the .class files of all the dependencies into the 
"lib" dir. I wanted to create an "all in one" PEAR file with all the necessary 
classes, so I configured useTransitiveDependencies to true. (By the way, you 
have to exclude org.apache.uima:uimaj-*:jar from the assembly.) 

Here's where it goes wrong. Maven smartly follows all the dependencies: 
TikaAnnotator 2.3.1 -> tika-parsers 0.7 -> poi-ooxml 3.6 -> dom4j 1.6.1 -> 
xml-apis. The problem is that xml-apis includes an implementation of the 
javax.xml package (I think, or some part of it, anyway). Apparently, dom4j 
pre-dates JDK 1.6, because since JDK 1.6 the javax.xml package is built into 
the JDK, and one doesn't need xml-apis. So what happens, I think, is some 
implementation of DocumentBuilderFactory is found in xml-apis, and it is 
somehow incompatible with the interface, and can't be instantiated. So 
DocumentBuilderFactory gives up, and doesn't even try the one in rt.jar (even 
though the classloader could find it, if asked). 

In short, due to xml-apis being in the PEAR file, the system can't find the 
good DocumentBuilderFactory in rt.jar. 

Solution: remove xml-apis from the PEAR file. 

I did it by changing my pom.xml: 

<dependency> 
<groupId>org.apache.uima</groupId> 
<artifactId>TikaAnnotator</artifactId> 
<exclusions> 
<exclusion> 
<groupId>xml-apis</groupId> 
<artifactId>xml-apis</artifactId> 
</exclusion> 
</exclusions> 
</dependency> 

=========== 

May I suggest that UIMA Add-ons upgrades to a newer version of Tika? 0.7 dates 
to April 2010. Current version is 1.2. I'm guessing that a more current version 
using a more current POI and DOM4J wouldn't have the dependency on xml-apis 
(since that package is now included in the JDK). I think that would be the best 
solution to allow using TikaAnnotator in PEAR files in Java 1.6 and later. 


Hope this helps someone. Can I be the only one using TikaAnnotator in PEAR 
files on Java 1.6? 


Greg Holmberg 


----- Original Message ----- 
From: "Marshall Schor" <[email protected]> 
To: [email protected] 
Sent: Wednesday, September 26, 2012 3:57:07 PM 
Subject: Re: ClassLoader problems when using PEAR files 

Hi Greg, 

Did you try troubleshooting this using the "Tip" in the Javadocs for the 
DocumentBuilderFactory class (add -Djaxp.debug=1 to the "java" command line)? 

-Marshall 

On 9/24/2012 6:46 PM, Greg Holmberg wrote: 
> Hi UIMA users-- 
> 
> 
> When I use PEAR files, the XML parser can't find it's DocumentBuilderFactory. 
> I think it's a ClassLoader issue. Has anyone else seen this? 
> 
> I install the PEAR as described in the docs: 
> 
> PackageBrowser pkg = PackageInstaller.installPackage(myDir, pearFile, false); 
> 
> String pearDescPath = pkg.getComponentPearDescPath(); 
> 
> ResourceSpecifier specifier = 
> UIMAFramework.getXMLParser().parseResourceSpecifier( 
> new XMLInputSource(pearDescPath)); 
> 
> ResourceManager resmgr = getResourceManager(); 
> 
> AnalysisEngine engine = UIMAFramework.produceAnalysisEngine(specifier, 
> resmgr, params); 
> 
> My PEAR includes TikaAnnotator, and I get the exception shown at the end of 
> this email. Summary: TikaConfig asks for an XML parser, but the system can't 
> find one. 
> 
> Outside the analysis engine, it's possible to find an implementation of 
> DocumentBuilderFactory, but inside it seems that the ClassLoader in use 
> doesn't have one. 
> 
> javax.xml.parsers.DocumentBuilderFactory.newInstance() has a complicated way 
> of finding the implementation (quoting the JavaDoc): 
> 
> ======================= 
> 
> Obtain a new instance of a DocumentBuilderFactory. This static method creates 
> a new factory instance. This method 
> uses the following ordered lookup procedure to determine the 
> DocumentBuilderFactory implementation class to load: 
> 
> * Use the javax.xml.parsers.DocumentBuilderFactory system property. 
> * Use the properties file "lib/jaxp.properties" in the JRE directory. This 
> configuration file is in standard java.util.Properties format and contains 
> the fully qualified name of the implementation class with the key being the 
> system property defined above. The jaxp.properties file is read only once by 
> the JAXP implementation and it's values are then cached for future use. If 
> the file does not exist when the first attempt is made to read from it, no 
> further attempts are made to check for its existence. It is not possible to 
> change the value of any property in jaxp.properties after it has been read 
> for the first time. 
> * Use the Services API (as detailed in the JAR specification), if available, 
> to determine the classname. The Services API will look for a classname in the 
> file META-INF/services/javax.xml.parsers.DocumentBuilderFactory in jars 
> available to the runtime. 
> * Platform default DocumentBuilderFactory instance. 
> 
> ========================= 
> 
> So it seems like the ClassLoader used in the analysis engine prevents 
> DocumentBuilderFactory from finding even the platform default implementation. 
> 
> Does anyone know how to work around this? Add something to my 
> metadata/install.xml file perhaps? 
> 
> Thanks, 
> 
> 
> Greg Holmberg 
> 
> 
> 
> org.apache.uima.resource.ResourceInitializationException: Error initializing 
> "org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper" from 
> descriptor 
> file:/tmp/taservice/pear/SAPAnalysisEngine/SAPAnalysisEngine_pear.xml. 
> at 
> org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:144)
>  
> at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>  
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314) 
> at 
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425) 
> at 
> com.sap.taservice.controller.UimaPipeline.createAnalysisEngine(UimaPipeline.java:343)
>  
> at com.sap.taservice.controller.UimaPipeline.execute(UimaPipeline.java:151) 
> at com.sap.taservice.controller.TAServiceWork.execute(TAServiceWork.java:44) 
> at com.sap.job.impl.TaskImpl.execute(TaskImpl.java:104) 
> at 
> com.sap.taservice.job.impl.remote.RemoteWorker.iteration(RemoteWorker.java:52)
>  
> at com.sap.util.DaemonRunnable.run(DaemonRunnable.java:117) 
> at java.lang.Thread.run(Thread.java:662) 
> Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for 
> javax.xml.parsers.DocumentBuilderFactory cannot be found 
> at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source) 
> at org.apache.tika.config.TikaConfig.getBuilder(TikaConfig.java:228) 
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:66) 
> at org.apache.uima.tika.MarkupAnnotator.initialize(MarkupAnnotator.java:96) 
> at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
>  
> at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:158)
>  
> at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>  
> at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>  
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at 
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) 
> at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:255) 
> at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
>  
> at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
>  
> at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
>  
> at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>  
> at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>  
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.internal.util.ResourcePool.fillPool(ResourcePool.java:243) 
> at org.apache.uima.internal.util.ResourcePool.<init>(ResourcePool.java:100) 
> at 
> org.apache.uima.internal.util.AnalysisEnginePool.<init>(AnalysisEnginePool.java:91)
>  
> at 
> org.apache.uima.analysis_engine.impl.MultiprocessingAnalysisEngine_impl.initialize(MultiprocessingAnalysisEngine_impl.java:118)
>  
> at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>  
> at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>  
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314) 
> at 
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425) 
> at 
> org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper.initialize(PearAnalysisEngineWrapper.java:269)
>  
> at 
> org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:123)
>  
> ... 11 more 
> 
> 

Reply via email to