**For anyone who stumbles into this post with the same problem, head
on over here ( http://markmail.org/thread/t5hmrob3jdmz7nqm ) for more
discussion and the solution that ended up working for us.
H. Wilson
On 06/04/2010 09:21 AM, H. Wilson wrote:
Hello,
I am using Jackrabbit 2.0 with OCM and after searching forums both
here and on Lucene, as well as Google, I have yet to find an answer.
(On an aside, if this question should have gone to the Lucene user's
list, please let me know!).
For starters, you should know our clients would like both
case-sensitive and case-insensitive options available to them. The
searches are to be on a property named fullName, which may contain
underscores and always contains a leading dot. (Also our client's
requirement.) And while yes, we are aware that leading wildcard
searches are not the best, the client still plans to use them. Here is
my issue:
* My searches using jcr:like work fine for all the scenarios I list
below.
* My searches with jcr:contains and exact names work fine (even with
underscores!).
* My jcr:contains searches using wildcards and underscores always
fail. I have even tried escaping them.
Given there are two objects in our repository with the following
fullName properties:
.North.South.East.WestLand
.North.South.East.West_Land
Both of the following work fine, and each return the respective object:
(jcr:contains(@fullName, '.North.South.East.WestLand'))
(jcr:contains(@fullName, '.North.South.East.West_Land'))
The following jcr:contains queries return BOTH objects successfully:
*North*
.North*
.North.*
The following queries successfully return the FIRST object:
*.South.East.WestLand
.*.South.East.WestLand
*South*.WestLand
*East.WestLand
*.WestLand
*East?WestLand
*?WestLand
*North.South.East.WestLand
And the following identical jcr:contains queries (except the
underscore) do not return anything, when I would expect the SECOND
Object:
*.South.East.West_Land
.*.South.East.West_Land
*South*.West_Land
*East.West_Land
*.West_Land
*East?West_Land
*?West_Land
*North.South.East.West_Land
UPDATE: After I wrote this large message, I just remembered something.
(It should be noted - I have been trying to tackle this off and on for
weeks, please bear with the slight memory loss, but maybe having seen
all this will help others.) I remember reading somewhere that Lucene
treats underscores as token dividers. So when I have Object properties
with underscores, it is splitting it into Tokens and essentially
dropping the underscore completely. Which could explain why exact name
search works. (Is this correct?) The above examples were using the
StandardAnalyzer. I have previously tried using the
WhitespaceAnalyzer, but doing so disables my ability to do leading
wildcard searches, which is absolutely required by our clients. I know
there is a way to turn on the leading wild card searches, but I could
not gather how to do it while using JackRabbit. Any advice on a way to
use any Analyzer which would satisfy our clients would be GREATLY
appreciated.
Thanks for your time and patience,
H. Wilson