Hi all,

I'm new to Solr and recently inherited a Solr application (version 5.4) from a 
previous developer with very little documentation.  At any rate, my problem is 
this:

I have some email addresses that are stored as mixed case.

[email protected]<mailto:[email protected]> = Success [querying for this 
email address and passing in the full email address in any case [upper or 
lower] returns the correct result]

[email protected]<mailto:[email protected]> = Fail [querying for this 
email address and passing in the full email address in any case [upper or 
lower] returns zero results]

And here's the fieldType definition that's used for email addresses:

<fieldType name="text_phonetic" class="solr.TextField" 
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1" splitOnNumerics="0"/>
                <filter class="solr.PhoneticFilterFactory" encoder="Caverphone" 
inject="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0" splitOnNumerics="0"/>
                                <filter class="solr.PhoneticFilterFactory" 
encoder="Caverphone" inject="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
                <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

I've spent a couple days researching this issue, and my best guess at a fix 
would be to re-index this data using the LowerCaseFilterFatory so that all 
email addresses are stored in lower case, but that would be a significant 
change as I have over 10 million docs indexed.  In addition, its strange that 
we get search results on some mixed case email addresses, but not all, so I'm 
hoping that maybe all we need is to tweak the query analyzer?  Thanks in 
advance for your help with this question.  Please let me know if you need any 
additional details.

-Miguel



________________________________

Notice: GBT Travel Services UK Limited (GBT UK) and its authorised sublicensees 
(including Ovation Travel Group and Egencia) use certain trademarks and service 
marks of American Express Company or its subsidiaries (American Express) in the 
'American Express Global Business Travel' and 'American Express Meetings & 
Events' brands and in connection with its business for permitted uses only 
under a limited licence from American Express (Licensed Marks). The Licensed 
Marks are trademarks or service marks of, and the property of, American 
Express. GBT UK is a subsidiary of Global Business Travel Group, Inc. (NYSE: 
GBTG). American Express holds a minority interest in GBTG, which operates as a 
separate company from American Express.

________________________________

This email message and all attachments transmitted with it are solely for the 
use of the intended recipient(s) and may contain confidential and/or privileged 
information. If the reader of this message is not the intended recipient, you 
are hereby notified that any dissemination, distribution, copying and/or other 
use of this message or its attachments is strictly prohibited. If you have 
received this message in error, please notify the sender and delete it 
immediately. Unintended transmission shall not constitute a waiver of the 
attorney-client or any other privilege.

________________________________
Avis : GBT Travel Services UK Limited (GBT UK) et ses d?tenteurs de 
sous-licence autoris?s (notamment Ovation Travel Group et Egencia) utilise 
certaines marques commerciales et marques de services d'American Express 
Company ou de ses filiales (American Express) dans les marques < American 
Express Global Business Travel > et < American Express Meetings & Events > 
ainsi qu'en lien avec son activit?, ? des fins autoris?es uniquement, sous une 
licence limit?e accord?e par American Express (marques sous licence). Les 
marques sous licence sont des marques commerciales ou des marques de services 
d'American Express, dont elles sont la propri?t?. GBT UK est une filiale de 
Global Business Travel Group, Inc. (NYSE : GBTG). American Express d?tient une 
participation minoritaire dans GBTG, qui op?re en tant que soci?t? distincte 
d'American Express.

________________________________

Ce message ?lectronique et toutes les pi?ces jointes transmises avec celui-ci 
sont uniquement destin?s ? l'usage du ou des destinataires vis?s et peuvent 
contenir des informations confidentielles et/ou privil?gi?es. Si le lecteur de 
ce message n'est pas le destinataire pr?vu, vous ?tes inform? par la pr?sente 
que toute diffusion, distribution, copie et/ou autre utilisation de ce message 
ou de ses pi?ces jointes est strictement interdite. Si vous avez re?u ce 
message par erreur, veuillez en informer l'exp?diteur et le supprimer 
imm?diatement. Une transmission involontaire ne constitue pas une renonciation 
au secret professionnel ou ? toute autre pr?rogative.

________________________________

Reply via email to