Howdy folks,
I've recently found "Tika" a great little library.
I used ikvmc.exe to convert it from the jar file to a dll usable in C#
and after some struggles with which libraries and which types to use,
got a test executable up and running. It worked great! Extracted
metadata just like I needed it to.
I then copied the functions from the executable and put it into a DLL
that will be used as a plug-in library for one of the projects I'm
currently working on. However (there's always a however, isn't there?),
Tika instantly started reporting only 3 items (default properties of
file length, file name and mime type) instead of the full metadata it
currently does.
I'm not sure if this is a IKVM thing or a Tika thing, so am posting it
here as well as on the IKVMmailing lists. Any assistance would be
greatly appreciated.
I'm using C# in Visual Studio 2005 w/ IKVM 0.46.0.1 and Tika v0.9
The code for the extraction is:
private void button1_Click(object sender, EventArgs e)
{
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
ParseContext parserContext = new ParseContext();
java.lang.Class parserClass = typeof(AutoDetectParser);
parserContext.set(parserClass, parser);
java.io.File file = new java.io.File(@"E:\temp\Docs for
Demo SM\E-documents\011NewCaseDONE.doc");
java.net.URL url = file.toURI().toURL();
using (java.io.InputStream inputStream =
MetadataHelper.getInputStream(url, metadata))
{
parser.parse(inputStream, getTransformerHandler(),
metadata, parserContext);
inputStream.close();
}
foreach (string name in metadata.names())
{
}
}
private TransformerHandler getTransformerHandler()
{
SAXTransformerFactory factory =
TransformerFactory.newInstance() as SAXTransformerFactory;
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "text");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
_outputWriter = new StringWriter();
handler.setResult(new StreamResult(_outputWriter));
return handler;
}
and here are the using lines to make it work if you create a project
using System;
using System.Windows.Forms;
using java.io;
using java.lang;
using javax.xml.transform;
using javax.xml.transform.sax;
using javax.xml.transform.stream;
using org.apache.tika.io;
using org.apache.tika.metadata;
using org.apache.tika.parser;
------------------------------------------------------------
The DLL takes the code from the Button1_Click and moves it into a simple
function that currently just tries to read it (no returns or anything at
this time).
When run from the executable, i get the following information (vs an MP3
file)
xmpDM:releaseDate=2009
Content-Length=4136960
xmpDM:audioChannelType=Stereo
xmpDM:album= Author=The B52's
xmpDM:artist=The B52's channels=2
xmpDM:audioSampleRate=44100
xmpDM:logComment=
xmpDM:trackNumber=1
version=MPEG 3 Layer III Version 1
xmpDM:composer=null
xmpDM:audioCompressor=MP3
title=Rock Lobster
samplerate=44100
xmpDM:genre=Blues
Content-Type=audio/mpeg
resourceName=The B52's - Rock Lobster.mp3
When run from the DLL i get the following information
Content-Length 4136960 |
Content-Type audio/mpeg |
resourceName The B52's - Rock Lobster.mp3 |
Thanks in advance,
Trevor Watson