Hi Erick
I think you've made some really important observations. Steven has
provided a good regular expression to help with word and non-words. For
the moment I have reverted to my analyser and am going to try doing some
clever pattern matching later. Also, Ill try using a different analyser
Hi Steve
Thanks for your response. I was just wondering whether there is a
difference between the regular expression you sent me i.e.
(i) \s*(?:\b|(?=\S)(?=\s)|(?=\s)(?=\S))\s*
and
(ii) \\b
as they lead to the same output. For example, the string search testing
a-new string=3/4
Hi Rahil,
Rahil wrote:
I was just wondering whether there is a
difference between the regular expression you sent me i.e.
(i) \s*(?:\b|(?=\S)(?=\s)|(?=\s)(?=\S))\s*
and
(ii) \\b
as they lead to the same output. For example, the string search testing
a-new string=3/4 results in
Hi Erick
Im having trouble with writing a good regular expression for the
PatternAnalyzer to deal with word and non-word characters.I couldnt
figure out a valid regular expression to write a valid
Pattern.compile(String regex) which can tokenise a string into O/E -
visual acuity R-eye=6/24
Hi Rahil,
Rahil wrote:
I couldnt figure out a valid regular expression to write a valid
Pattern.compile(String regex) which can tokenise a string into O/E -
visual acuity R-eye=6/24 into O,/,E, -, visual, acuity,
R, -, eye, =, 6, /, 24.
The following regular expression should match
: I have a custom-built Analyzer where I tokenize all non-whitespace
: characters as well available in the field TERM (which is the only
: field being tokenised).
: If I now query my index file for a term 6/12 for instance, I get back
: only ONE result
: instead of TWO. There is another token in
Hi
I have a custom-built Analyzer where I tokenize all non-whitespace
characters as well available in the field TERM (which is the only
field being tokenised).
If I now query my index file for a term 6/12 for instance, I get back
only ONE result
SCOREDESCRIPTIONSTATUSCONCEPTID
Most often, from what I've seen on this e-mail list, unexpected results are
because you're not indexing on the tokens you *think* you're indexing. Or
not searching on them. By that I mean that the analyzers you're using are
behaving in ways you don't expect.
That said, I think you're getting
Hi Erick
Thanks for your response. There's a lot to chew on in your reply and Im
looking at the suggestions you've made.
Yeah I have Luke installed and have queried my index but there isn't any
great explanation Im getting out of it. A query for 6/12 is sent as
TERM:6/12 which is quite
Well, I'm not the greatest expert, but a quick look doesn't show me anything
obvious. But I have to ask, wouldn't WhiteSpaceAnalyzer work for you?
Although I don't remember whether WhiteSpaceAnalyzer lowercases or not.
It sure looks like you're getting reasonable results given how you're
10 matches
Mail list logo