Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
Hmmm, you're beyond what I've tried to do, so all I can do is speculate. But I don't believe that two terms on top of each other are considered when calculating slop. But I really don't know for sure, so I'd create a couple of unit tests to verify. You're right, the combinatorial explosion with pu

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor
Thanks again for this. I would like to able to do several things with this data if possible. As per Mark's post, I'd like to be able to query for phrases like "He _v"~1 (where _v is my verb part of speech token) to recover string like: "He later apologized". This already in fact seems to be worki

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
Ahhh, I should have followed the link. I was interpreting your first note as emitting two tokens NOT at the same offset. My mistake, ignore my nonsense about unexpected consequences. Your original assumption is correct, zero offsets are pretty transparent. What do you really want to do here? Mark'

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor
Thanks, Erick - Indeed every word will have a part of speech token but Is this how the slop actually works? My understanding was that if I have two tokens in the same location then each will not effect searches involving other in terms of the slop as slop indicates the number of words *between* s

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
If I'm reading this right, your tokenizer creates two tokens. One "report" and one "_n"... I suspect if so that this will create some "interesting" behaviors. For instance, if you put two tokens in place, are you going to double the slop when you don't care about part of speech? Is every word going