RE: leading and trailing wildcard query

Bernadette Houghton Thu, 05 Nov 2009 14:26:24 -0800

I've just set up something similar (much thanks to Avesh!)-

<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.EdgeNGramFilterFactory" minGramSize="5" maxGramSize="25" 
/> 
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

<fieldType name="doubleedgytext" class="solr.TextField" 
positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.NGramFilterFactory" minGramSize="5" maxGramSize="25" />
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>
.
.
   <field name="beginswith" type="edgytext" indexed="true" stored="false" 
multiValued="true"/>
   <field name="contains" type="doubleedgytext" indexed="true" stored="false" 
multiValued="true"/>
.
.
   
   <copyField source="content" dest="beginswith"/>
   <copyField source="*_t" dest="beginswith"/>
   <copyField source="*_mt" dest="beginswith"/>

   <!-- Copy for CONTAINS search -->
   <copyField source="content" dest="contains"/>
   <copyField source="*_t" dest="contains"/>
   <copyField source="*_mt" dest="contains"/>

bern

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, 6 November 2009 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: leading and trailing wildcard query

The guilt trick is not the best thing to try on public mailing lists. :)

The first thing that popped to my mind is to use 2 fields, where the second one 
contains the desrever string of the first one.
The second idea is to use n-grams (if it's OK to tokenize), more specifically 
edge n-grams.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

----- Original Message ----
> From: A. Steven Anderson <a.steven.ander...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, November 5, 2009 3:04:32 PM
> Subject: Re: leading and trailing wildcard query
> 
> No thoughts on this? Really!?
> 
> I would hate to admit to my Oracle DBE that Solr can't be customized to do a
> common query that a relational database can do. :-(
> 
> 
> On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson <
> a.steven.ander...@gmail.com> wrote:
> 
> > I've scoured the archives and JIRA , but the answer to my question is just
> > not clear to me.
> >
> > With all the new Solr 1.4 features, is there any way  to do a leading and
> > trailing wildcard query on an *untokenized* field?
> >
> > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx
> >
> > Yes, I know how expensive such a query would be, but we have the user
> > requirement, nonetheless.
> >
> > If not, any suggestions on how to implement a custom solution using Solr?
> > Using an external data structure?
> >
> >
> -- 
> A. Steven Anderson

RE: leading and trailing wildcard query

Reply via email to