Thanks Erick!
Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then
query for mixedCase will no longer also match mixed Case.
I think I want WDF to... kind of do all of the above.
Specifically, I had thought that it would allow a query for mixedCase
to match both/either
Right, that's what I meant by WDF not being magic - you can configure it
to match any three out of four use cases as you choose, but there is no
choice that matches all of the use cases.
To be clear, this is not a bug in WDF, but simply a limitation.
-- Jack Krupansky
On Tue, Dec 30, 2014 at
I guess I don't understand what the four use cases are, or the three out
of four use cases, or whatever. What the intended uses of the WDF are.
Can you explain what the intended use of setting:
generateWordParts=1 catenateWords=1 splitOnCaseChange=1
Is that supposed to do something useful (at
On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote:
I'm a bit confused about what splitOnCaseChange combined with catenateWords
is meant to do at all. It _is_ generating both the split and single-word
tokens at query time
Have you tried only having WDF during indexing with
On 12/30/14 11:45 AM, Alexandre Rafalovitch wrote:
On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote:
I'm a bit confused about what splitOnCaseChange combined with catenateWords
is meant to do at all. It _is_ generating both the split and single-word
tokens at query time
I do have a more thorough discussion of WDF in my Solr Deep Dive e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
You're not wrong about anything here... you just need to accept that WDF
is not magic and can't handle every
Okay, thanks. I'm not sure if it's my lack of understanding, but I feel
like I'm having a very hard time getting straight answers out of you
all, here.
I want the query mixedCase to match both/either mixed Case and
mixedCase in the index.
What configuration of WDF at index/query time would
You want preserveOriginal=“1”.
You should only do this processing at index time.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
Okay, thanks. I'm not sure if it's my lack of understanding,
On 12/30/14 12:35 PM, Walter Underwood wrote:
You want preserveOriginal=“1”.
You should only do this processing at index time.
If I only do this processing at index time, then mixedCase at query
time will no longer match mixed Case in the index/source material.
I think I'm having trouble
On 12/30/14 12:42 PM, Jonathan Rochkind wrote:
On 12/30/14 12:35 PM, Walter Underwood wrote:
You want preserveOriginal=“1”.
You should only do this processing at index time.
If I only do this processing at index time, then mixedCase at query
time will no longer match mixed Case in the
There are two approaches for the query “mixedCase” to match “mixed Case” in the
original document.
1. Add an index time synonym.
2. Add a ShingleFilterFactory to the index analysis chain.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 30, 2014, at 9:50
Okay, some months later I've come back to this with an isolated
reproduction case. Thanks very much for any advice or debugging help you
can give.
The WordDelimiter filter is making a mixed-case query NOT match the
single-case source, when it ought to.
I am in Solr 4.3 (sorry, that's what
WDF is powerful, but it is not magic. In general, the indexed data is
expected to be clean while the query might be sloppy. You need to separate
the index and query analyzers and they need to respect that distinction -
the index analyzer would index as you have indicated, indexing both the
unitary
splitOnCaseChange=1
So, it does not get split during indexing because there is no case
change. But does get split during search and now you are looking for
partial tokens against a combined single-token in the index. And not
matching.
The WordDelimiterFilterFactory is more for product IDs that
On 12/29/14 5:24 PM, Jack Krupansky wrote:
WDF is powerful, but it is not magic. In general, the indexed data is
expected to be clean while the query might be sloppy. You need to separate
the index and query analyzers and they need to respect that distinction
I do not understand what separate
Jonathan:
Well, it works if you set splitOnCaseChange=0 in just the query part
of the analysis chain. I probably mislead you a bit months ago, WDFF
is intended for this case iff you expect the case change to generate
_tokens_ that are individually meaningful.. And unfortunately
significant in one
On 29 December 2014 at 18:07, Jonathan Rochkind rochk...@jhu.edu wrote:
I do not understand what separate query/index analysis you are suggesting to
accomplish what I wanted.
I am sure you do know that, but just in case. At the moment, you have
only one analyzer chain, so it applies at both
Thanks Erick and Diego. Yes, I noticed in my last message I'm not
actually using defaults, not sure why I chose non-defaults originally.
I still need to find time to make a smaller isolation/reproduction case,
I'm getting confusing results that suggest some other part of my field
def may be
Jonathan:
If at all possible, delete your collection/data directory (the whole
directory, including data) between runs after you've changed
your schema (at least any of your analysis that pertains to indexing).
Mixing old and new schema definitions can add to the confusion!
Good luck!
Erick
On
Hello, I'm running into a case where a query is not returning the
results I expect, and I'm hoping someone can offer some explanation that
might help me fine tune things or understand what's up.
I am running Solr 4.3.
My filter chain includes a WordDelimiterFilter and, later a filter that
Hi Jonathan,
Little confused by this line:
And, what I think it's trying to do, is match text indexed as d elalain
as well as text indexed by delalain.
In this case, I don't know how WordDelimiterFilter will help, as you're
likely tokenizing on spaces somewhere, and that input text has a
Thanks for the response.
I understand the problem a little bit better after investigating more.
Posting my full field definitions is, I think, going to be confusing, as
they are long and complicated. I can narrow it down to an isolation case
if I need to. My indexed field in question is
If that's your problem, I bet all you have to do is twiddle on one of the
catenate options, either catenateWords or catenateAll.
Michael Della Bitta
Applications Developer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
New York, NY 10017
t:
Yes, thanks, I realize I can twiddle those parameters, but it will
probably result in MacBook no longer matching mac book at all, but
ONLY matching macbook.
My understanding of the default settings of WordDelimiterFactory is that
they are intending for MacBook to match both mac book AND
bq: In my actual index, query MacBook is matching ONLY mac book, and
not macbook
I suspect your query parameters for WordDelimiterFilterFactory doesn't have
catenate words set.
What do you see when you enter these in both the index and query portions
of the admin/analysis page?
Best,
Erick
On
On 9/2/14 1:51 PM, Erick Erickson wrote:
bq: In my actual index, query MacBook is matching ONLY mac book, and
not macbook
I suspect your query parameters for WordDelimiterFilterFactory doesn't have
catenate words set.
What do you see when you enter these in both the index and query portions
of
What happens if you append debug=query to your query? IOW, what does the
_parsed_ query look like?
Also note that the defaults for WDFF are _not_ identical. catenateWords and
catenateNumbers are 1 in the
index portion and 0 in the query section. Still, this shouldn't be a
problem all other things
Although not a solution, this may help in trying to find the problem.
In http://solr.pl/en/2010/08/16/what-is-schema-xml/ it says:
It is worth noting that there is an additional attribute for the text field
type:
autoGeneratePhraseQueries
This attribute is responsible for telling filters
28 matches
Mail list logo