Howdy folks,

I've recently found "Tika" a great little library.

I used ikvmc.exe to convert it from the jar file to a dll usable in C# and after some struggles with which libraries and which types to use, got a test executable up and running. It worked great! Extracted metadata just like I needed it to.


I then copied the functions from the executable and put it into a DLL that will be used as a plug-in library for one of the projects I'm currently working on. However (there's always a however, isn't there?), Tika instantly started reporting only 3 items (default properties of file length, file name and mime type) instead of the full metadata it currently does.

I'm not sure if this is a IKVM thing or a Tika thing, so am posting it here as well as on the IKVMmailing lists. Any assistance would be greatly appreciated.

I'm using C# in Visual Studio 2005 w/ IKVM 0.46.0.1 and Tika v0.9

The code for the extraction is:

        private void button1_Click(object sender, EventArgs e)
        {
            AutoDetectParser parser = new AutoDetectParser();
            Metadata metadata = new Metadata();
            ParseContext parserContext = new ParseContext();
            java.lang.Class parserClass = typeof(AutoDetectParser);

            parserContext.set(parserClass, parser);

java.io.File file = new java.io.File(@"E:\temp\Docs for Demo SM\E-documents\011NewCaseDONE.doc");
            java.net.URL url = file.toURI().toURL();


using (java.io.InputStream inputStream = MetadataHelper.getInputStream(url, metadata))
            {
parser.parse(inputStream, getTransformerHandler(), metadata, parserContext);
                inputStream.close();
            }


            foreach (string name in metadata.names())
            {

            }
        }

        private TransformerHandler getTransformerHandler()
        {

SAXTransformerFactory factory = TransformerFactory.newInstance() as SAXTransformerFactory;
            TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "text"); handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");

            _outputWriter = new StringWriter();
            handler.setResult(new StreamResult(_outputWriter));
            return handler;
        }

and here are the using lines to make it work if you create a project

using System;
using System.Windows.Forms;

using java.io;
using java.lang;
using javax.xml.transform;
using javax.xml.transform.sax;
using javax.xml.transform.stream;
using org.apache.tika.io;
using org.apache.tika.metadata;
using org.apache.tika.parser;

------------------------------------------------------------

The DLL takes the code from the Button1_Click and moves it into a simple function that currently just tries to read it (no returns or anything at this time).

When run from the executable, i get the following information (vs an MP3 file)
xmpDM:releaseDate=2009
Content-Length=4136960
xmpDM:audioChannelType=Stereo
xmpDM:album= Author=The B52's
xmpDM:artist=The B52's channels=2
xmpDM:audioSampleRate=44100
xmpDM:logComment=
xmpDM:trackNumber=1
version=MPEG 3 Layer III Version 1
xmpDM:composer=null
xmpDM:audioCompressor=MP3
title=Rock Lobster
samplerate=44100
xmpDM:genre=Blues
Content-Type=audio/mpeg
resourceName=The B52's - Rock Lobster.mp3

When run from the DLL i get the following information
Content-Length 4136960  |
Content-Type audio/mpeg  |
resourceName The B52's - Rock Lobster.mp3  |


Thanks in advance,

Trevor Watson

Reply via email to