Thanks Mitch, using the analysis page has been a real eye-opener and given
me a better insight into how Solr was applying the filters (and more
importantly in which order). I've ironically ended up with a charFilter
mapping file as this seemed the only route to replacing characters before
the tokenizer kicked in, unfortunately Solr just refused to allow sorting on
anything tokenized with characters other than whitespace.

Cheers, Ian.

-----Original Message-----
From: MitchK [mailto:mitc...@web.de] 
Sent: 07 March 2010 22:44
To: solr-user@lucene.apache.org
Subject: Re: Handling and sorting email addresses


Ian,

did you have a look at Solr's admin analysis.jsp?
When everything on the analysis's page is fine, you have missunderstood
Solr's schema.xml-file.

You've set two attributes in your schema.xml:
stored = true
indexed = true

What you get as a response is the stored field value.
The stored field value is the original field value, without any
modifications.
However, Solr is using the indexed field value to query your data.

Kind regards
- Mitch
 

Ian Battersby wrote:
> 
> Forgive what might seem like a newbie question but am struggling
> desperately
> with this. 
> 
> We have a dynamic field that holds email address and we'd like to be able
> to
> sort by it, obviously when trying to do this we get an error as it thinks
> the email address is a tokenized field. We've tried a custom field type
> using PatternReplaceFilterFactory to specify that @ and . should be
> replaced
> with " AT " and " DOT " but we just can't seem to get it to work, all the
> field still contain the unparsed email.
> 
> We used an example found on the mailing-list for the field type:
> 
>     <fieldType name="email" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.PatternReplaceFilterFactory" pattern="\."
> replacement=" DOT " replace="all" />
>        <filter class="solr.PatternReplaceFilterFactory" pattern="@"
> replacement=" AT " replace="all" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>       </analyzer>
>     </fieldType>
> 
> .. our dynamic field looks like ..
> 
>   <dynamicField name="dynamicemail_*"  type="email"  indexed="true"
> stored="true"  multiValued="true" />
> 
> When writing a document to Solr it still seems to write the original email
> address (e.g. this.u...@somewhere.com) opposed to its parsed version (e.g.
> this DOT user AT somewhere DOT com). Can anyone help? 
> 
> We are running version 1.4 but have even tried the nightly build in an
> attempt to solve this problem.
> 
> Thanks.
> 
> 
> 

-- 
View this message in context:
http://old.nabble.com/Handling-and-sorting-email-addresses-tp27813111p278152
39.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to