Hi Christopher,
    One option comes to mind: shingles?

I have not done anything with them yet, but that is on my radar for sometime about a month out. Speaking unencumbered by experience or substantial understanding, my guess is that shingles would be great for you if you can select shingles with something like a terms prefix.

AFAIU: Shingling[1] basically takes a number of terms/words, and combines them into a single token. You could set the (max)shingle size to 2, and then find some way to use the terms component on the shingled field with a prefix, potentially:
http://wiki.apache.org/solr/TermsComponent

I'm interested in what you find out, so please post back if you find something outside the mailing list.
Thanks,

Sean


[1] see something like: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28shingle%29, but the Solr 1.4 Enterprise Search Server book is well worth the money, and I believe there is an ebook version for $10-20.

On 10/26/2010 08:26 AM, Christopher Ball wrote:
Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar.



In simple words, I need facet on the next word given a target word.



For example, if my index only had the following 5 documents (comprised of a
sentence each):



Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.



The query should give the frequency of the distinct terms after the word
"fox":



skipped - 2

crashed - 2

jumped - 1



Long-term, do the opposite - frequency of the distinct terms before the word
"fox":



brown - 2

sly - 1

fat - 1

red - 1



My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.



Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.



Gracias,



Christopher





Reply via email to