Just glancing over this. I believe one of the recent shingle contributions over in Lucene contrib/ indeed has the option to add those begin/end marker characters, so if this will solve your exact matching needs, that's the thing to look at. You'll have to write (and contribute?) a bit of glue to use it in Solr.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Mck <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, September 8, 2008 4:43:50 AM > Subject: Re: Replacing FAST functionality at sesam.no > > > I'm not very familiar with shingles but it seems to be that you should > > have ShingleFilter at index time and make the query as a phrase query? > > Then the entry "abcd efgh ijkl" would be indexed as > (abcd "abcd efgh" "abcd efgh ijkl" efgh "efgh ijkl" ijkl) > > and a subsequent query "abcd" would return this entry. > If this is so then this is not exact matching and not what we are > looking for. > > The filter behaviour we are looking for is like: > (i've included ^$ to denote the exact matching) > > Original Query --> Filtered Query > abcd --> ^abcd$ > "abcd efgh" --> (^abcd$ ^"abcd efgh"$ ^efgh$) > "abcd efgh ijkl" --> (^abcd$ ^"abcd efgh"$ ^"abcd efgh ijkl"$ ^efgh$ ^"efgh > ijkl"$ ^ijkl$) > > > ~mck > > -- > "All stable processes we shall predict. All unstable processes we shall > control." John von Neumann > | semb.wever.org | sesat.no | sesam.no |