Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Thanks Erick! Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then query for mixedCase will no longer also match mixed Case. I think I want WDF to... kind of do all of the above. Specifically, I had thought that it would allow a query for mixedCase to match both/either

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
Right, that's what I meant by WDF not being magic - you can configure it to match any three out of four use cases as you choose, but there is no choice that matches all of the use cases. To be clear, this is not a bug in WDF, but simply a limitation. -- Jack Krupansky On Tue, Dec 30, 2014 at

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
I guess I don't understand what the four use cases are, or the three out of four use cases, or whatever. What the intended uses of the WDF are. Can you explain what the intended use of setting: generateWordParts=1 catenateWords=1 splitOnCaseChange=1 Is that supposed to do something useful (at

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Alexandre Rafalovitch
On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time Have you tried only having WDF during indexing with

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 11:45 AM, Alexandre Rafalovitch wrote: On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jack Krupansky
I do have a more thorough discussion of WDF in my Solr Deep Dive e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html You're not wrong about anything here... you just need to accept that WDF is not magic and can't handle every

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
Okay, thanks. I'm not sure if it's my lack of understanding, but I feel like I'm having a very hard time getting straight answers out of you all, here. I want the query mixedCase to match both/either mixed Case and mixedCase in the index. What configuration of WDF at index/query time would

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
You want preserveOriginal=“1”. You should only do this processing at index time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Okay, thanks. I'm not sure if it's my lack of understanding,

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the index/source material. I think I'm having trouble

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Michael Sokolov
On 12/30/14 12:42 PM, Jonathan Rochkind wrote: On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
There are two approaches for the query “mixedCase” to match “mixed Case” in the original document. 1. Add an index time synonym. 2. Add a ShingleFilterFactory to the index analysis chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:50

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
Okay, some months later I've come back to this with an isolated reproduction case. Thanks very much for any advice or debugging help you can give. The WordDelimiter filter is making a mixed-case query NOT match the single-case source, when it ought to. I am in Solr 4.3 (sorry, that's what

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jack Krupansky
WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction - the index analyzer would index as you have indicated, indexing both the unitary

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
splitOnCaseChange=1 So, it does not get split during indexing because there is no case change. But does get split during search and now you are looking for partial tokens against a combined single-token in the index. And not matching. The WordDelimiterFilterFactory is more for product IDs that

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
On 12/29/14 5:24 PM, Jack Krupansky wrote: WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction I do not understand what separate

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Erick Erickson
Jonathan: Well, it works if you set splitOnCaseChange=0 in just the query part of the analysis chain. I probably mislead you a bit months ago, WDFF is intended for this case iff you expect the case change to generate _tokens_ that are individually meaningful.. And unfortunately significant in one

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
On 29 December 2014 at 18:07, Jonathan Rochkind rochk...@jhu.edu wrote: I do not understand what separate query/index analysis you are suggesting to accomplish what I wanted. I am sure you do know that, but just in case. At the moment, you have only one analyzer chain, so it applies at both

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-03 Thread Jonathan Rochkind
Thanks Erick and Diego. Yes, I noticed in my last message I'm not actually using defaults, not sure why I chose non-defaults originally. I still need to find time to make a smaller isolation/reproduction case, I'm getting confusing results that suggest some other part of my field def may be

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-03 Thread Erick Erickson
Jonathan: If at all possible, delete your collection/data directory (the whole directory, including data) between runs after you've changed your schema (at least any of your analysis that pertains to indexing). Mixing old and new schema definitions can add to the confusion! Good luck! Erick On

WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
Hi Jonathan, Little confused by this line: And, what I think it's trying to do, is match text indexed as d elalain as well as text indexed by delalain. In this case, I don't know how WordDelimiterFilter will help, as you're likely tokenizing on spaces somewhere, and that input text has a

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Thanks for the response. I understand the problem a little bit better after investigating more. Posting my full field definitions is, I think, going to be confusing, as they are long and complicated. I can narrow it down to an isolation case if I need to. My indexed field in question is

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Michael Della Bitta
If that's your problem, I bet all you have to do is twiddle on one of the catenate options, either catenateWords or catenateAll. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t:

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Yes, thanks, I realize I can twiddle those parameters, but it will probably result in MacBook no longer matching mac book at all, but ONLY matching macbook. My understanding of the default settings of WordDelimiterFactory is that they are intending for MacBook to match both mac book AND

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of the admin/analysis page? Best, Erick On

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Erick Erickson
What happens if you append debug=query to your query? IOW, what does the _parsed_ query look like? Also note that the defaults for WDFF are _not_ identical. catenateWords and catenateNumbers are 1 in the index portion and 0 in the query section. Still, this shouldn't be a problem all other things

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Diego Fernandez
Although not a solution, this may help in trying to find the problem. In http://solr.pl/en/2010/08/16/what-is-schema-xml/ it says: It is worth noting that there is an additional attribute for the text field type: autoGeneratePhraseQueries This attribute is responsible for telling filters