Re: tokenizer of solr

Jack Krupansky Thu, 11 Apr 2013 19:33:32 -0700

In that case, use the types="wdfftypes.txt" attribute of WDF and map "@" and"_" to ALPHA as shown in:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory.


-- Jack Krupansky

-----Original Message-----From: Mingfeng Yang

Sent: Thursday, April 11, 2013 8:50 PM
To: [email protected]
Subject: Re: tokenizer of solr

looks like it's due to the word delimiter filter.  Anyone know if the
"protected" file support regular expression or not?

Ming

On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky<[email protected]>wrote:

Try the whitespace tokenizer.

-- Jack Krupansky

-----Original Message----- From: Mingfeng Yang Sent: Thursday, April 11,
2013 7:48 PM To: [email protected] Subject: tokenizer of solr
Dear Solr users and developers,

I am trying to index some documents some of which are twitter messages,and

we have a problem when indexing retweet.

Say a twitter user named "jpc_108" post a tweet, and then someone retweet
his msg, and now @jpc_108 become part of the tweet text body.

Seems like before indexing, the tokenizer factory of solr turns "@jpc_108"

into "jpc and 108", and when we search for jpc_108, it's not thereanymore.



Is there anyway we can keep "jcp_108" when it appears as "@jpc_108"?

Thanks,
Ming-

Re: tokenizer of solr

Reply via email to