I'm not sure if this is Tika related or more to the IKVM side. But I'm going to post this here just in case as well.

Hello!

I've been trying to use Tika via IKVM to extract the contents of text files. With some help from this mailing list (thanks guys!) i've got it reading a MS Word (doc) file renamed to something odd (and that was the goal of using Tika over IFilters)

Our project includes the ability to add plug-ins (that we write) to process files that aren't handled by IFilters or Tika. These plugins are loaded during run-time. We use the Assembly.GetExportedTypes to make sure that the DLLs that we loaded are valid plugins. However, after calling asm.GetExportedTypes() Tika/IKVM no longer works and crashes with an odd exception.

We're using code found online called TikaOnDotNet to use Tika.

The code (C# / .NET 4.0) is as follows
---------------------------------------------------------------------------------------------------------
// Create and test Tika extractor
TikaOnDotNet.TextExtractor _cut = new TikaOnDotNet.TextExtractor();
TikaOnDotNet.TextExtractionResult result = _cut.Extract(@"D:\Work\NamedWrong\What you need for distribution_was_doc.qrt");
// Works here

// Works here
System.Reflection.Assembly asm = System.Reflection.Assembly.LoadFrom(file.FullName);
// Works here
foreach (Type t in asm.GetExportedTypes())
// Calling asm.GetExportedTypes() breaks Tika
---------------------------------------------------------------------------------------------------------

In the TextExtractor.cs file from TikaOnDotNet, the crash occurs when trying to load an AutoDetectParser (which when stepping through loads the ClassLoader from the MyClassLoader.cs class)

---------------------------------------------------------------------------------------------------------
var parser = new AutoDetectParser(); // Crashes on this line
---------------------------------------------------------------------------------------------------------


The error is as follows
---------------------------------------------------------------------------------------------------------


FactoryConfigurationError was unhandled

{"Provider ???\0\0\0?\0\0\0)System.Resources.ResourceReader, mscorlibsSystem.Resources.RuntimeResourceSet, mscorlib, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b77a5c561934e089\0\0\0\0\0\0\0\0\0]System.Byte[], mscorlib, Version=1.0.5000.0, Culture=neutral, PublicKeyToken=b77a5c561934e089PADP?nY\0\0\0\0\0-\0\0l\0z\0\0\0\0\0\0\0\0\0\0????\0\0\0\0\0\0\0\0\0\0)\0\0\0g??q?? not found"}


at javax.xml.parsers.DocumentBuilderFactory.newInstance()
at org.apache.tika.mime.MimeTypesReader.read(InputStream )
at org.apache.tika.mime.MimeTypesFactory.create(InputStream inputStream)
at org.apache.tika.mime.MimeTypesFactory.create(URL url)
at org.apache.tika.mime.MimeTypesFactory.create(String filePath)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes()
at org.apache.tika.config.TikaConfig..ctor(CompositeParser )
at org.apache.tika.config.TikaConfig..ctor()
at org.apache.tika.config.TikaConfig.getDefaultConfig()
at org.apache.tika.parser.AutoDetectParser..ctor()
at TikaOnDotNet.TextExtractor.Extract(String filePath) in C:\<project>\Tika\TextExtractor.cs:line 43 ---------------------------------------------------------------------------------------------------------

Any assistance would be greatly appreciated.

Trevor Watson

Reply via email to