See: 
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

It discusses the general problem of particular filters being able to
cope with wildcards or not. Generally any filter that could
potentially produce more than one output token per input token is
skipped when wildcards are encountered.

Best,
Erick

On Thu, Sep 14, 2017 at 6:26 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
> You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
> analysis chain.
>
> Thanks,
> Susheel
>
>
> On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
>> > I have a problem when searching on email addresses.
>> > @ seems to be handled as a special character but I don't find anything
>> about it in the documentation.
>> >
>> > This is my test data
>> > t...@one.com
>> > t...@two.com
>>
>> Chances are that have analysis defined on this field, and that the
>> analysis includes a tokenizer or tokenizer/filter combination that
>> splits on punctuation.  This means that for the both entries, you have
>> three terms.  For the first one, those terms are test, one, and com.
>> For the second one, they are test,  two, and com.  The rest of what I'm
>> writing assumes that this is the case.
>>
>> > searching for test* results both, ok.
>>
>> This matches the term "test" in both entries.
>>
>> > searching for t...@one.com results the correct one, ok.
>>
>> Query analysis probably splits the same way index analysis does, so the
>> actual search is for all three terms.
>>
>> > searching for test results both, what I didn't expect but it's ok.
>>
>> In this case, it matches the simple term "test" that's in the index on
>> both documents.
>>
>> > searching for test@one* results none and that's the problem.
>>
>> When you include wildcards in a query, most query analysis is skipped,
>> so it's looking for the literal text "test@one" followed by any
>> characters.  Because the index analysis removed the @ character and
>> split the things around it into separate terms, this will not match any
>> of the terms in the index.
>>
>> Wildcards, while they do work in many cases, are often not the correct
>> way to do queries.
>>
>> Thanks,
>> Shawn
>>
>>

Reply via email to