Hi Belinda,

Belinda Randolph wrote:
1.  Can I replace the JackRabbit search engine with my own?

Yes you can, there are several interfaces you have to implement. See interface QueryHandler for a starting point:
http://svn.apache.org/repos/asf/jackrabbit/tags/1.3/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/QueryHandler.java

2. Does your search engine look through actual document contents - as a background process or at the time of the actual user search?

Whether text is extracted from documents and indexed when the document is saved or deferred to a a number of background threads is configurable.

3. What FORMATs of actual documents does your search engine look at? (Ascii, Microsoft, PDF, etc.)

The currently supported formats are:
- Microsoft Word, Excel, PowerPoint
- PDF
- Open Office Documents (text, spreadsheet, presentation, etc.)
- RTF
- HTML
- XML

Text extraction in Jackrabbit is extensible. See:
http://svn.apache.org/repos/asf/jackrabbit/tags/1.3/jackrabbit-text-extractors/src/main/java/org/apache/jackrabbit/extractor/TextExtractor.java

4. When searching the contents of a PDF file, does the background process, using OCR, create an additional file in another format? What format?

The text extractor in Jackrabbit does not use OCR technology, but if you have an existing java solution you may easily integrate it into Jackrabbit.

5. Does your OCR routine search FORMATS other than PDF? If yes, what formats can the OCR search?

n/a

6.  What are the resolution requirements for your OCR routines?

n/a

7. Can I change the GUI to a) add functionality or error checking and b) to look personalized with CSS?

Jackrabbit is a content repository infrastructure and does not come with a user interface. You may use any existing JCR compliant application on top of Jackrabbit.

8. Can the search engine search using both requested metadata element values and keywords from the document contents?

Yes, this is possible.

9. Can I start with keywords from the document contents and then later filter the results using user inputted metadata element values?

Yes, you would simply execute a second query that includes metadata values.

10. Can I start with user input metadata element values and then later filter down the results with document contents?

Yes, you would execute the first query with just the metadata values and then a second one with additional keywords entered by the user.

11. After an initial search, can I refine my search by only looking at the results of the previous search?

Yes, you would simply execute the initial query again with additional search 
terms.

regards
 marcel

Reply via email to