Re: Indexing word with plus sign

2017-05-24 Thread Fundera Developer
Thank you very much Erick! You're right! The "Char" part in PatternReplaceCharFilterFactory misguided me and I tought it was just for Char replacements. One I have gone through the documentation of CharFilters (my fault...) I realized that I could use the very same regex I was using with the

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
That was on Solr 1.3, so I’m pretty sure it was the whitespace tokenizer. The synonym substitution for “+/-" was done in client code and indexing code, outside of Solr. We also sanitized queries to remove all query syntax characters. wunder Walter Underwood wun...@wunderwood.org

Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
Thanks Walter!! For the sake of curiosity, do you remember which Tokenizer were you using in that case? Thanks! El 23/05/17 a las 20:02, Walter Underwood escribió: Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave up and translated that to “plusminus” at index

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
Years ago at Netflix, I had to deal with a DVD from a band named “+/-“. I gave up and translated that to “plusminus” at index and query time. http://plusmin.us/ Luckily, “.hack//Sign” and other related dot-hack anime matched if I just deleted all the punctuation. And

Re: Indexing word with plus sign

2017-05-23 Thread Erick Erickson
You need to distinguish between PatternReplaceCharFilterFactory and PatternReplaceFilterFactory The first one is applied to the entire input _before_ tokenization. The second is applied _after_ tokenization to individual tokens, by that time it's too late. It's an easy thing to miss. And at

Re: Indexing word with plus sign

2017-05-23 Thread Fundera Developer
I have also tried this option, by using a PatternReplaceFilterFactory, like this: but it gets processed AFTER the Tokenizer, so when it executes there is no longer an "i+d" token, but two "i" and "d" independent tokens. Is there a way I could make the filter execute before the Tokenizer? I

Re: Indexing word with plus sign

2017-05-22 Thread Rick Leir
Fundera, You need a regex which matches a '+' with non-blank chars before and after. It should not replace a '+' preceded by white space, that is important in Solr. This is not a perfect solution, but might improve matters for you. Cheers -- Rick On May 22, 2017 1:58:21 PM EDT, Fundera

Re: Indexing word with plus sign

2017-05-22 Thread Fundera Developer
Thank you Zahid and Erik, I was going to try the CharFilter suggestion, but then I doubted. I see the indexing process, and how the appearance of 'i+d' would be handled, but, what happens at query time? If I use the same filter, I could remove '+' chars that are added by the user to identify

Re: Indexing word with plus sign

2017-05-22 Thread Erick Erickson
You can also use any of the other tokenizers. WhitespaceTokenizer for instance. There are a couple that use regular expressions. Etc. See: https://cwiki.apache.org/confluence/display/solr/Tokenizers Each one has it's considerations. WhitespaceTokenizer won't, for instance, separate out

Re: Indexing word with plus sign

2017-05-22 Thread Muhammad Zahid Iqbal
Hi, Before applying tokenizer, you can replace your special symbols with some phrase to preserve it and after tokenized you can replace it back. For example: Thanks, Zahid iqbal On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < funderadevelo...@outlook.com> wrote: > Hi all, > > I am a

Indexing word with plus sign

2017-05-21 Thread Fundera Developer
Hi all, I am a bit stuck at a problem that I feel must be easy to solve. In Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5, and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the index documents both in Spanish and Catalan, and in Catalan it is