I am a newbie, but see my answers below
----- Original Message -----
From: "Belinda Randolph" <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, May 2, 2007 4:42:23 PM (GMT-0500) America/New_York
Subject: JackRabbit Search Engine Questions

I am in the process of evaluating 10 repository solutions for my project.

I have several questions to ask in order to make my decisions.

1.  Can I replace the JackRabbit search engine with my own?

yes you can, but why would you?  I migrated to jackrabbit just so I could 
retire all my search code (written in lucene).  Of course, you could write you 
own crawler/indexer to access the jackrabbit repository.

2.  Does your search engine look through actual document contents - 
as a background process or at the time of the actual user search?

The document is indexed when it is added to the repository.  When the user 
searches, it is executing the search against a previouly built index.  Very 
fast.


3.  What FORMATs of actual documents does your search engine look 
at?  (Ascii, Microsoft, PDF, etc.)

All those formats, and more.  You can easily create new ones if you like, you 
will have to set the mime type on the content that you add to get your custom 
indexer to run against it.

4.  When searching the contents of a PDF file, does the background 
process, using OCR, create an additional file in another format? What format?

When the pdf is added to jackrabbit, text is extracted from the pdf and added 
to the search index.  No OCR involved.  Just text extracted from the pdf.  If 
the PDF contains only images, it will not do any ocr on those images.


5.  Does your OCR routine search FORMATS other than PDF? If yes, what 
formats can the OCR search?

There is no OCR technology involved, rather the text in the microsoft word 
document, etc.  is extracted from the file using a library that understands the 
MS/PDF binary file format, so no OCR is necessary.


6.  What are the resolution requirements for your OCR routines?

NO OCR involved with jackrabbit, but keep in mind, that we have the use of 
libraries that understand MS Word/PDF/Etc formats that can extract the textual 
content of the files.

7.  Can I change the GUI to a) add functionality or error checking 
and b) to look personalized with CSS?

Jackrabbit does not have a gui, so you are in total control of it.  Some folks 
(like me) have application components written that allow easy creating of gui's 
to read/write/access jackrabbit content.


8.  Can the search engine search using both requested metadata 
element values and keywords from the document contents?

yes.

9.  Can I start with keywords from the document contents and then 
later filter the results using user inputted metadata element values?

yes.

10.  Can I start with user input metadata element values and then 
later filter down the results with document contents?

Yes.

11.  After an initial search, can I refine my search by only looking 
at the results of the previous search?

don't know.

Thanks,
Belinda



Reply via email to