Why do you want to do this, what is it meant to accomplish? There might be a better way to accomplish what it is you are trying to do; I can't think of anything (which doesn't mean it doesn't exist) that what you're actually trying to do would be required in order to do. What sorts of queries do you intend to serve with this setup?

I don't believe there is any analyzer that will do exactly what you've specified, included with Solr out of the box. You could definitely write your own analyzer in Java to do it. But I still suspect you may not actually need to construct your index like that to accomplish whatever you are trying to accomplish.

The only point I can think of to caring what words are next to what other words is for phrase and proximity searches. However, with what you've specified, phrase and proximity searches wouldn't be at all useful anyway, as EVERY word would be next to every other word, so any phrase or proximity search including any words present at all would match, so might as well not do a phrase and proximity search at all, in which case it should not matter what order or how close together the words are in the index. Why not just use an ordinary Whitespace Tokenizer, and just do ordinary dismax or lucene queries without using phrase or proximity?

On 1/20/2011 4:03 PM, Martin Jansen wrote:
Hey there,

I'm looking for an<analyzer>  configuration for Solr 1.4 that
accomplishes the following:

Given the input "abc xyz foo" I would like to add at least the following
token combinations to the index:

        abc
        abc xyz
        abc xyz foo
        abc foo
        xyz
        xyz foo
        foo

A WhitespaceTokenizer combined with a ShingleFilter will take me there
to some extent, but won't e.g. add "abc foo" to the index.  Is there a
way to do this?

- Martin

Reply via email to