**For anyone who stumbles into this post with the same problem, head on over here ( http://markmail.org/thread/t5hmrob3jdmz7nqm ) for more discussion and the solution that ended up working for us.

H. Wilson

On 06/04/2010 09:21 AM, H. Wilson wrote:
Hello,

I am using Jackrabbit 2.0 with OCM and after searching forums both here and on Lucene, as well as Google, I have yet to find an answer. (On an aside, if this question should have gone to the Lucene user's list, please let me know!).

For starters, you should know our clients would like both case-sensitive and case-insensitive options available to them. The searches are to be on a property named fullName, which may contain underscores and always contains a leading dot. (Also our client's requirement.) And while yes, we are aware that leading wildcard searches are not the best, the client still plans to use them. Here is my issue:

   * My searches using jcr:like work fine for all the scenarios I list
     below.
   * My searches with jcr:contains and exact names work fine (even with
     underscores!).
   * My jcr:contains searches using wildcards and underscores always
     fail. I have even tried escaping them.

Given there are two objects in our repository with the following fullName properties:

   .North.South.East.WestLand
   .North.South.East.West_Land


Both of the following work fine, and each return the respective object:

   (jcr:contains(@fullName, '.North.South.East.WestLand'))
   (jcr:contains(@fullName, '.North.South.East.West_Land'))


The following jcr:contains queries return BOTH objects successfully:

   *North*
   .North*
   .North.*

The following queries successfully return the FIRST object:

   *.South.East.WestLand
   .*.South.East.WestLand
   *South*.WestLand
   *East.WestLand
   *.WestLand
   *East?WestLand
   *?WestLand
   *North.South.East.WestLand

And the following identical jcr:contains queries (except the underscore) do not return anything, when I would expect the SECOND Object:

   *.South.East.West_Land
   .*.South.East.West_Land
   *South*.West_Land
   *East.West_Land
   *.West_Land
   *East?West_Land
   *?West_Land
   *North.South.East.West_Land

UPDATE: After I wrote this large message, I just remembered something. (It should be noted - I have been trying to tackle this off and on for weeks, please bear with the slight memory loss, but maybe having seen all this will help others.) I remember reading somewhere that Lucene treats underscores as token dividers. So when I have Object properties with underscores, it is splitting it into Tokens and essentially dropping the underscore completely. Which could explain why exact name search works. (Is this correct?) The above examples were using the StandardAnalyzer. I have previously tried using the WhitespaceAnalyzer, but doing so disables my ability to do leading wildcard searches, which is absolutely required by our clients. I know there is a way to turn on the leading wild card searches, but I could not gather how to do it while using JackRabbit. Any advice on a way to use any Analyzer which would satisfy our clients would be GREATLY appreciated.

Thanks for your time and patience,
H. Wilson


Reply via email to