In my repository, I'm storing a variety of files, including some images
of paper documents. I'd like to be able to hook up an OCR engine to do
full-text search against these images (usually TIFFs), but I'm having
issues getting Jackrabbit to pick up my class. To ensure that I can get
the system to pick up my class, I've written a simple testing version of
the class for now before actually adding in any OCR. I've included this
class at the bottom of the e-mail.
I've edited the workspace.xml to include my class in the
textFilterClasses parameter of the SearchIndex node, added my jar to the
classpath, deleted the index to force a re-index, and ran a very simple
test. Yet, when I search for the test text, I get 0 results.
Can someone please tell me what I'm doing wrong?
Thanks,
--Nick Allmaker
--------ImageTextExtractor.java--------
package test.extractors;
import java.io.InputStream;
import java.io.Reader;
import java.io.StringReader;
import org.apache.jackrabbit.extractor.AbstractTextExtractor;
public class ImageTextExtractor extends
org.apache.jackrabbit.extractor.AbstractTextExtractor
{
public ImageTextExtractor()
{
super(new String[]{"image/tiff", "image/jpeg",
"image/png", "image/gif"});
}
public Reader extractText(InputStream stream, String type,
String encoding)
{
stream.close();
return new StringReader("This is a test extraction.");
}
}