I just tried out the jena-text indexing and query capabilities of jena
2.11. Great stuff, but the property values I indexed contain part
numbers that frequently contain hyphens. Apparently Lucene's
StandardAnalyzer tokenizes on hyphens, so my initial search results were
quite puzzling.

However, even with the limited results, I can see that the text queries
are much faster than strstarts() or regex() filters on the same property
values. So I would like to try indexing the property values using
Lucene's KeywordAnalyzer. I think I can see in the code how this could
be easily done.

Has anyone else encountered this problem? Have I missed some other way
to improve response time for a filtered string search, or overestimated
the possible performance improvement? (I'm new to Lucene.) Would the
developers consider an enhancement to make this option configurable in
the text assembler?

Regards,
--Paul

Reply via email to