Ard,
You are absolutely right.. and this didn't make sense to me either. I
think I was too worn out from my week and too excited to have code that
"worked" to notice the obvious... this must be a workaround. However, I
will need a little guidance on how to inspect the tokens. I have Luke,
but never really understood how to use it properly. Could you give me a
clear list of steps, or point me to a resource I missed, on how I would
go about inspecting tokens during insert/search? Thanks.
H. Wilson
On 08/30/2010 03:30 AM, Ard Schrijvers wrote:
Hello,
On Fri, Aug 27, 2010 at 9:06 PM, H. Wilson<[email protected]> wrote:
OK, well I got the spaces part figured out, and will post it for anyone who
needs it. Putting quotes around the spaces unfortunately did not work.
During testing, I determined that if you performed the following query for
the exact fullName property:
filter.addContains ( @fullName,
'"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Land"));
It would return nothing. But tweak it a little and add a wildcard, and it
would return results:
filter.addContains ( @fullName,
'"+Text.escapeIllegalXpathSearchChars(".North.South.East.West Lan*"));
This does not make sense...see below
But since I did not want to throw in wild cards where they might not be
wanted, if a search string contained spaces, did not contain wild cards and
the user was not concerned with case sensitivity, I used the fn:lower-case.
So I ended up with the following excerpt (our clients wanted options for
case sensitive and case insensitive searching) .
public OurParameter[] getOurParameters (boolean performCaseSensitiveSearch,
String searchTerm, String srchField ) { //srchField in this case was
fullName
.....
if ( performCaseSensitiveSearch) {
//jcr:like for case sensitive
filter.orJCRExpression ("jcr:like(@" + srchField +",
'"+Text.escapeIllegalXpathSearchChars (searchTerm)+"')");
}
else {
//only use fn:lower-case if there is spaces, with NO wild cards
if ( searchTerm.contains (" ")&& !searchTerm.contains ("*")&&
!searchTerm.contains ("?") ) {
filter.addJCRExpression ("fn:lower-case(@"+srchField+") =
'"+Text.escapeIllegalXpathSearchChars(searchTerm.toLowerCase())+"'");
}
else {
//jcr:contains for case insensitive
filter.addContains ( srchField,
Text.escapeIllegalXpathSearchChars(searchTerm));
}
}
This seems to me a workaround around the real problem, because, it
just doesn't make sense to me. Can you inspect the tokens that are
created by your analyser. Make sure you inspect the tokens during
indexing (just store something) and during searching: just search in
the property. I am quite sure you'll see the issue then. Perhaps
something with Text.escapeIllegalXpathSearchChars though it seems that
it should leave spaces untouched
Regards Ard
....
}
Hope that helps anyone who needs it.
H. Wilson
OK so it looks like I have one other issue. Using the configuration as
posted below and sticking to my previous examples, with the addition of
one
with whitespace. With the following three in our repository:
.North.South.East.WestLand
.North.South.East.West_Land
.North.South.East.West Land //yes that's a space
...using a jcr:contains, with exact name search with NO wild cards: the
first two return properly, but the last one yields no result.
filter.addContains(@fullName,
'"+org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars(".North.South.East.West
Land") +"'));
I think the space in a contains is seen as an AND by the
Jackrabbit/Lucene QueryParser. I should test this however as I am not
sure. Perhaps you can put quotes around it, not sure if that works
though
Regards Ard
According to the Lucene documentation, KeywordAnalyzer should be creating
one token, plus combined with escaping the Illegal Characters (i.e.
spaces),
shouldn't this search work? Thanks again.
H. Wilson