Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Erick Erickson
bq: To me this seems like a design flaw. The Solr fieldtypes seem like they allow a developer to create types that should handle wildcards intelligently. Well, that's pretty impossible. WordDelimiter(Graph)FilterFactory is a case in point. It's designed to break up on

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
It doesn't seem to matter what you do in the query analyzer, if you have a wildcard, it won't use it. Which is exactly the behavior I observed. the solution was to set preserveOriginal="1" and change the etl process to not strip the dashes, letting the index analyzer do that. We have a lot of

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Saurabh Sethi
Webster, did you try escaping the special character (assuming you did not do what Shawn did by replacing - with some other text and your indexed tokens have -)? On Thu, Jul 27, 2017 at 12:03 PM, Webster Homer wrote: > Shawn, > Thank you for that. I didn't know about that

Re: WordDelimiterFilterFactory with Wildcards

2017-07-27 Thread Webster Homer
Shawn, Thank you for that. I didn't know about that feature of the WDF. It doesn't help my situation but it's great to know about. Googling solr wildcard searches I found this link

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Erick Erickson
catenateWords="0" >> >catenateNumbers="1" >> >catenateAll="0" >> >preserveOriginal="0" >> >stemEnglishPossess

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
preserveOriginal="0" > >stemEnglishPossessive="0"/> > > > > > > > > > > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < > > saurabh.se...@sendgrid.com> > > wrote: > > > > > 1.

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
" >catenateAll="0" >preserveOriginal="0" >stemEnglishPossessive="0"/> > > > > > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi < > saurabh.se...@sendgrid.com

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
, Jul 26, 2017 at 10:48 AM, Webster Homer <webster.ho...@sial.com> > wrote: > > > I have several fieldtypes that use the WordDelimiterFilterFactory > > > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers > > separated by hyphens, users often leave

Re: WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Saurabh Sethi
1. What tokenizer are you using? 2. Do you have preserveOriginal="1" flag set in your filter? 3. Which version of solr are you using? On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <webster.ho...@sial.com> wrote: > I have several fieldtypes that use the WordDelimiterFilter

WordDelimiterFilterFactory with Wildcards

2017-07-26 Thread Webster Homer
I have several fieldtypes that use the WordDelimiterFilterFactory We have a fieldtype for cas numbers. which look like 1234-12-1, numbers separated by hyphens, users often leave out the hyphens and either use spaces or just string the numbers together. The WDF seemed like a great solution

Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2015-10-26 Thread Jamie Johnson
I came across this post ( http://lucene.472066.n3.nabble.com/Payload-doesn-t-apply-to-WordDelimiterFilterFactory-generated-tokens-td3136748.html) and tried to find a JIRA for this task. Was one ever created? If not I'd be happy to create it if this is still something that makes sense

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Jack Krupansky
reindexing. WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping to get some clarification or if something sticks out here. Below is the field type definition being used: fieldType name=field_tokenized class=solr.TextField omitNorms=true analyzer type=index

WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
reindexing. WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping to get some clarification or if something sticks out here. Below is the field type definition being used:  fieldType name=field_tokenized class=solr.TextField omitNorms=true    analyzer type=index

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Mike L.
...@gmail.com To: solr-user@lucene.apache.org; Mike L. javaone...@yahoo.com Sent: Sunday, April 5, 2015 8:23 AM Subject: Re: WordDelimiterFilterFactory - tokenizer question You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did

Re: WordDelimiterFilterFactory and position increment.

2015-02-04 Thread Dmitry Kan
Hi, Could you enable it on the querying side and re-test your case? The rule of thumb I usually follow is to make the index and query side transformations as close as possible. HTH, Dmitry On Wed, Feb 4, 2015 at 6:14 AM, Modassar Ather modather1...@gmail.com wrote: Hi, No I am not using

Re: WordDelimiterFilterFactory and position increment.

2015-02-03 Thread Modassar Ather
Hi, No I am not using WordDelimiterFilter on query side. Regards, Modassar On Fri, Jan 30, 2015 at 5:12 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi, Do you use WordDelimiterFilter on query side as well? On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather modather1...@gmail.com wrote:

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Dmitry Kan
Hi, Do you use WordDelimiterFilter on query side as well? On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather modather1...@gmail.com wrote: Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54

Re: WordDelimiterFilterFactory and position increment.

2015-01-30 Thread Modassar Ather
Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I am using WordDelimiterFilter while indexing. Parser used is edismax. Phrase search is

WordDelimiterFilterFactory and position increment.

2015-01-22 Thread Modassar Ather
Hi, I am using WordDelimiterFilter while indexing. Parser used is edismax. Phrase search is failing for terms like 3d image. On the analysis page it shows following four tokens for *3d* and there positions. *token position* 3d 1 3 1 3d 1 d

WordDelimiterFilterFactory and PatternReplaceCharFilterFactory

2014-11-05 Thread Jae Joo
Hi, Once I apply PatternReplaceCharFilterFactory to the input string, the position of token is changed. Here is an example. charFilter class=solr.PatternReplaceCharFilterFactory pattern=(lt;/?ce:italic[^]*) replacement=/ filter class=solr.WordDelimiterFilterFactory

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-18 Thread benjelloun
hello, for WordDelimiterFilterFactory: this is an exemple in schema.xml to folow: field name=spell type=textSpell multiValued=true indexed=true required=false stored=false/ fieldType name=textSpell class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-17 Thread jiag
thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-17 Thread Erick Erickson
WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Ahmet Arslan
want the xbox product to be returned.  I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box case. Is this correct? (1) In my schema file, this is what I changed: filter class

RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Dyer, James
Jia, I agree that for the spellcheckers to work, you need arr name=last-components instead of arr name=components. But the x-box = xbox example ought to be solved by analyzing using WordDelimiterFilterFactory and catenateWords=1 at query-time. Did you re-index after changing your analysis

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Diego Fernandez
Supportability - Diagnostics - Original Message - Jia, I agree that for the spellcheckers to work, you need arr name=last-components instead of arr name=components. But the x-box = xbox example ought to be solved by analyzing using WordDelimiterFilterFactory and catenateWords=1 at query

questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-15 Thread jiag
Hello everyone :) I have a product called xbox indexed, and when the user search for either x-box or x box i want the xbox product to be returned. I'm new to Solr, and from reading online, I thought I need to use WordDelimiterFilterFactory for x-box case, and WordBreakSolrSpellChecker for x box

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
is the suggested way of accomplishing this? Would we just have to extend the JFlex file for the tokenizer and re-compile it? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Ahmet Arslan
-compile it? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-20 Thread Diego Fernandez
- and /, but possibly others as well), what is the suggested way of accomplishing this? Would we just have to extend the JFlex file for the tokenizer and re-compile it? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-17 Thread Shawn Heisey
On 5/16/2014 9:24 AM, aiguofer wrote: Jack Krupansky-2 wrote Typically the white space tokenizer is the best choice when the word delimiter filter will be used. -- Jack Krupansky If we wanted to keep the StandardTokenizer (because we make use of the token types) but wanted to use the

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread aiguofer
/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-05-16 Thread Ahmet Arslan
for the tokenizer and re-compile it? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not allowing exact match with WordDelimiterFilterFactory

2014-04-25 Thread Jack Krupansky
Krupansky -Original Message- From: Kashish Sent: Friday, April 25, 2014 2:49 PM To: solr-user@lucene.apache.org Subject: Not allowing exact match with WordDelimiterFilterFactory Hi, I am having some problem with WordDelimiterFilterFactory. This is my fieldType fieldType name

Re: Not allowing exact match with WordDelimiterFilterFactory

2014-04-25 Thread Kashish
getting removed? I have no clue. - Thanks, Kashish -- View this message in context: http://lucene.472066.n3.nabble.com/Not-allowing-exact-match-with-WordDelimiterFilterFactory-tp4133193p4133235.html Sent from the Solr - User mailing list archive at Nabble.com.

WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Bob Laferriere
I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when used in conjunction with StandardTokenizerFactory (STF).If I use the following configuration:fieldType name="text_en" class="solr.TextField" positionIncrementGap="100" analyze

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Shawn Heisey
On 4/16/2014 8:37 PM, Bob Laferriere wrote: I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when used in conjunction with StandardTokenizerFactory (STF). snip I see the following results for the document of “wi-fi”: Index: “wi”, “fi” Query: “wi”,”fi”,”wifi

Re: WordDelimiterFilterFactory and StandardTokenizer

2014-04-16 Thread Jack Krupansky
Typically the white space tokenizer is the best choice when the word delimiter filter will be used. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, April 16, 2014 11:03 PM To: solr-user@lucene.apache.org Subject: Re: WordDelimiterFilterFactory

AW: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-04-09 Thread Malte Hübner
-Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 29. März 2014 16:09 An: solr-user@lucene.apache.org Betreff: Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-04-09 Thread Erick Erickson
sense of what actually happens if you look at the docs for the filter rather than the factory, int this case the WordDelimiterFilter rather than WordDelimiterFilterFactory. This latter is not where the action is, but it's what's available for definitions in schema.xml. Best, Erick On Wed, Apr 9

Re: WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-29 Thread Erick Erickson
Why do you say at the indexing part: The given search term is: *X-002-99-495* WordDelimiterFilterFactory indexes the following word parts: * X (shouldn't be there) * 00299495 (shouldn't be there) ?? You've set catenateNumbers=1 in your fieldType for the indexig part, so WDFF is doing exactly

WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)

2014-03-27 Thread Malte Hübner
I am using Solr 4.7 and have got a serious problem with WordDelimiterFilterFactory. WordDelimiterFilterFactory behaves different on hyphenated terms if they contain charaters (a-Z) or characters AND numbers. Splitting up hyphenated terms is deactivated in my configuration

Re: Question on WordDelimiterFilterFactory use

2012-12-26 Thread Dmitry Kan
of WordDelimiterFilterFactory. so if i set it to split on intra word delimiter, generateWordparts=1 and catenateWords=1, for the word is i-pod, the ff query will return a result = i pod and ipod? Thanks

Re: Question on WordDelimiterFilterFactory use

2012-12-26 Thread Anirudha Jadhav
results there too. Best, Dmitry Kan On Wed, Dec 26, 2012 at 10:08 AM, Jose Yadao josesya...@gmail.com wrote: Hi and Happy Holidays to everyone. I have a question regarding the use of WordDelimiterFilterFactory. so if i set it to split on intra word delimiter, generateWordparts=1

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Erick Erickson
:55 PM, Chris Book chrisb...@gmail.com wrote: Hello, I've recently upgraded from Solr 1.4.1 to 3.6.1 and an running into a problem with a specific query. When I search for 8mile or 8-mile without the quotes, and I use just the WordDelimiterFilterFactory as configured below, I get this query

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Yonik Seeley
: Hello, I've recently upgraded from Solr 1.4.1 to 3.6.1 and an running into a problem with a specific query. When I search for 8mile or 8-mile without the quotes, and I use just the WordDelimiterFilterFactory as configured below, I get this query which is as expected: album:(8mile 8) mile

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Jack Krupansky
-Original Message- From: Chung Wu Sent: Monday, May 14, 2012 7:01 PM To: solr-user@lucene.apache.org Subject: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory Hi all! I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either using

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Chung Wu
synonyms at index time. See the text_en_splitting field type in the example schema. -- Jack Krupansky -Original Message- From: Chung Wu Sent: Monday, May 14, 2012 7:01 PM To: solr-user@lucene.apache.org Subject: Unexpected query rewrite from WordDelimiterFilterFactory

Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

2012-05-14 Thread Jack Krupansky
Subject: Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory Thanks Jack! It's too bad I can't have catenate and generateParts both set to 1 at query time. If I set catenate to 0, then I miss the case where wifi is indexed but wi-fi is queried. If I set

Re: DisMax and WordDelimiterFilterFactory

2011-10-27 Thread Erick Erickson
   /fieldType The important feature here is the use of WordDelimiterFilterFactory, which allows a search for WiFi to match an indexed term of wi fi (for example). The problem, of course, is that if a user accidentally introduces a case change in their query, the query analyzer chain breaks

RE: DisMax and WordDelimiterFilterFactory (limitations of MultiPhraseQuery)

2011-10-27 Thread Demian Katz
this could matter. Has anyone given this any thought? - Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, October 27, 2011 8:21 AM To: solr-user@lucene.apache.org Subject: Re: DisMax and WordDelimiterFilterFactory What happens if you change

DisMax and WordDelimiterFilterFactory

2011-10-25 Thread Demian Katz
class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType The important feature here is the use of WordDelimiterFilterFactory

Re: Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-18 Thread Chris Hostetter
: It seems that the payloads are applied only to the original word that I : index and the WordDelimiterFilter doesn't apply the payloads to the tokens : it generates. I believe you are correct. I think the general rule for most TokenFilters that you will find in Lucene/Solr is that they don't

Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-04 Thread Lox
Hi, I have a problem with the WordDelimiterFilterFactory and the DelimitedPayloadTokenFilterFactory. It seems that the payloads are applied only to the original word that I index and the WordDelimiterFilter doesn't apply the payloads to the tokens it generates. For example, imagine I index

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
from analyzer stack for WordDelimiterFilterFactory Aha! I knew something must be awry, but when I looked at the analysis page output, well it sure looked like it should match. :) OK here is the query side WDF that finally works, I just turned everything off. (yay) First I tried just completely

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Otis Gospodnetic
ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Tue, April 26, 2011 4:39:49 PM Subject: RE: term position question from analyzer stack for WordDelimiterFilterFactory

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Erick Erickson
Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
[mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, April 22, 2011 5:55 PM To: Robert Petersen Cc: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen rober...@buy.com wrote

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Yonik Seeley
On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack are the same. Ahhh, they should not be! Using both generate and catenate in WDF at query time is a no-no. Same reason you can't have multi-word synonyms at query time:

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-25 Thread Robert Petersen
: Monday, April 25, 2011 9:24 AM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory On Mon, Apr 25, 2011 at 12:15 PM, Robert Petersen rober...@buy.com wrote: The search and index analyzer stack are the same. Ahhh, they should

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Robert Petersen
this work... thanks for the help! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, April 21, 2011 5:54 PM To: solr-user@lucene.apache.org Subject: Re: term position question from analyzer stack for WordDelimiterFilterFactory

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-22 Thread Yonik Seeley
On Fri, Apr 22, 2011 at 8:24 PM, Robert Petersen rober...@buy.com wrote: I can repeatedly demonstrate this in my dev environment, where I get entirely different results searching for AppleTV vs. appletv You originally said I cannot get a match between AppleTV on the indexing side and appletv on

term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Robert Petersen
So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side. Without that setting the all lowercase version of AppleTV is in term position two due to the catenateWords=1

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-21 Thread Yonik Seeley
On Thu, Apr 21, 2011 at 8:06 PM, Robert Petersen rober...@buy.com wrote: So if I don't put preserveOriginal=1 in my WordDelimiterFilterFactory settings I cannot get a match between AppleTV on the indexing side and appletv on the search side. Hmmm, that shouldn't be the case. The text field

Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread cyang2010
Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish? What about asian lanaguage? The following are the typical use case for WordDelimiterFilterFactory. Is 1, 2, 3, and 4 applicable to all wester language (including spanish)? For asian language

Re: Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread Ahmet Arslan
Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish?  Yes it makes sense. WDF is especially good for product names; like i-phone, iphone4 etc.

WordDelimiterFilterFactory

2011-02-04 Thread John kim
If i use WordDelimiterFilterFactory during indexing and at query time, will a search for cls500 find cls 500 and cls500x? If so, will it find and score exact matches higher? If not, how do you get exact matches to display first?

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
filters in a field type act on text. Make sure to check verbose output for both index and query. For this specific issue, yes, a query for cls500 will match both of those examples. To get the exact match to score higher: - create a text field (or a custom type that uses the WordDelimiterFilterFactory

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich
Hi, the final solution is explained here in context: http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e /If you are using Solr branch_3x or trunk, you can turn this off, by setting autoGeneratePhraseQueries to

WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, I am going crazy but which config is necessary to include the missing doc 2? I have: doc1 tw:aBc doc2 tw:abc Now a query aBc returns only doc 1 although when I try doc2 from admin/analysis.jsp then the term text 'abc' of the index gets highlighted as intended. I even indexed a simple

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi, Please add preserveOriginal=1 to your WDF [1] definition and reindex (or just try with the analysis page). This will make sure the original input token is being preserved along the newly generated tokens. If you then pass it all through a lowercase filter, it should match both documents.

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal=1 to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? filter class=solr.WordDelimiterFilterFactory protected=protwords.txt generateWordParts=1 generateNumberParts=1 catenateAll=0

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Ken Stanley
On Thu, Nov 18, 2010 at 3:22 PM, Peter Karich peat...@yahoo.de wrote: Hi, Please add preserveOriginal=1  to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? filter class=solr.WordDelimiterFilterFactory protected=protwords.txt            

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange=0 to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario',

A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: This is a TechNote. 1.2 Query is: q

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Erick Erickson
Really well done problem statement by the way On Tue, Sep 14, 2010 at 5:40 AM, yandong yao yydz...@gmail.com wrote: Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
/SOLR-1852 , which was fixed in 1.4.1 On Tue, Sep 14, 2010 at 5:40 AM, yandong yao yydz...@gmail.com wrote: Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: This is a TechNote. 1.2 Query is: q=TechNote 1.3 Result: no matches

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/29/2010 2:17 PM, Erick Erickson wrote: charFilters are applied even before the tokenizer Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true. At least looking at this in the analysis page from

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/30/2010 9:01 AM, Shawn Heisey wrote: On 8/29/2010 2:17 PM, Erick Erickson wrote: charFilters are applied even before the tokenizer Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true. At

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week. That works out to between 300,000 and

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
On 8/28/2010 7:59 PM, Shawn Heisey wrote: The only drop in term quality that I noticed was that possessive words (apostrophe-s) no longer have the original preserved. I haven't yet decided whether that's a problem. I finally did notice another drop in term quality from the dual pass -

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Erick Erickson
Look at the tokenizer/filter chain that makes up your analyzers, and see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for other tokenizer/analyzer/filter options. You're on the right track looking at the various choices provided, and I suspect you'll find what you need... Be a

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
Thank you for taking the time to help. The way I've got the word delimiter index filter set up with only one pass, wolf-biederman will result in wolf, biederman, wolfbiederman, and wolf-biederman. With two passes, the last one is not present. One pass changes gremlin's to gremlin and

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Erick Erickson
There's nothing built into SOLR that I know of that'll deal with auto-detecting multiple languages and doing the right thing. I know there's been discussion of that, searching the users' list might help... You may have to write your own analyzer that tries to do this, but I have no clue how you'd

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-28 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week, which usually works out to between

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-27 Thread Markus Jelsma
data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am using the following definition on both indexing and querying. Would it make

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-27 Thread Erick Erickson
. On Thursday 26 August 2010 17:45:45 Shawn Heisey wrote: Can I pass my data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am

Multiple passes with WordDelimiterFilterFactory

2010-08-26 Thread Shawn Heisey
Can I pass my data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am using the following definition on both indexing

dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Ya-Wen Hsu
Hi all, I'm facing the same issue as previous post here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html. Since no one answers this post, I thought I'll ask again. In my case, I use below setting for index filter class=solr.WordDelimiterFilterFactory generateWordParts=1

Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Yonik Seeley
On Thu, Mar 11, 2010 at 1:07 PM, Ya-Wen Hsu y...@eline.com wrote: Hi all, I'm facing the same issue as previous post here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html. Since no one answers this post, I thought I'll ask again. In my case, I use below setting for

Re: dismax and WordDelimiterFilterFactory with PreserveOriginal = 1

2010-03-11 Thread Erick Erickson
Kind of a shot in the dark here, but your parameters for index and query on WordDelimiterFilterFactory are different, especially suspicious is catenateWords. You could test this by looking in your index with the SOLR admin page and/or Luke to see what your actual terms are. And don't forget

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-30 Thread Erick Erickson
. Regards Rahul On Sun, Nov 29, 2009 at 1:07 AM, Steven A Rowe sar...@syr.edu wrote: Hi Rahul, On 11/26/2009 at 12:53 AM, Rahul R wrote: Is there a way by which I can prevent the WordDelimiterFilterFactory from totally acting on numerical data ? prevent ... from totally acting

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-29 Thread Rahul R
, Steven A Rowe sar...@syr.edu wrote: Hi Rahul, On 11/26/2009 at 12:53 AM, Rahul R wrote: Is there a way by which I can prevent the WordDelimiterFilterFactory from totally acting on numerical data ? prevent ... from totally acting on is pretty vague, and nowhere AFAICT do you say precisely what

RE: Trouble Configuring WordDelimiterFilterFactory

2009-11-28 Thread Steven A Rowe
Hi Rahul, On 11/26/2009 at 12:53 AM, Rahul R wrote: Is there a way by which I can prevent the WordDelimiterFilterFactory from totally acting on numerical data ? prevent ... from totally acting on is pretty vague, and nowhere AFAICT do you say precisely what it is you want. It would help

Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-25 Thread Rahul R
a combination of numbers, alphabets, special characters etc. I have a requirement wherein the WordDelimiterFilterFactory does not work on numbers, especially those with decimal points. Accuracy of results with relevance to numerical data is quite important, So if the text field of a document has

Trouble Configuring WordDelimiterFilterFactory

2009-11-24 Thread Rahul R
Hello, In our application we have a catch-all field (the 'text' field) which is cofigured as the default search field. Now this field will have a combination of numbers, alphabets, special characters etc. I have a requirement wherein the WordDelimiterFilterFactory does not work on numbers

Re: Problems with WordDelimiterFilterFactory

2009-10-10 Thread Shalin Shekhar Mangar
On Fri, Oct 9, 2009 at 3:33 AM, Patrick Jungermann patrick.jungerm...@googlemail.com wrote: Hi Bern, the problem is the character sequence --. A query is not allowed to have minus characters that consequent upon another one. Remove one minus character and the query will be parsed without

Re: Problems with WordDelimiterFilterFactory

2009-10-09 Thread Chantal Ackermann
Hi Bernadette, Bernadette Houghton schrieb: Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up the error, but still doesn't find the right record. I see from marklo's analysis page that solr is still parsing it with a hyphen. Changing this part of our schema.xml -

  1   2   >