Hmmm, you're beyond what I've tried to do, so all I can do is speculate. But
I don't
believe that two terms on top of each other are considered when calculating
slop. But I really don't know for sure, so I'd create a couple of unit tests
to verify.
You're right, the combinatorial explosion with pu
Thanks again for this.
I would like to able to do several things with this data if possible.
As per Mark's post, I'd like to be able to query for phrases like "He _v"~1
(where _v is my verb part of speech token) to recover string like: "He later
apologized".
This already in fact seems to be worki
Ahhh, I should have followed the link. I was interpreting your first note as
emitting two tokens NOT at the same offset. My mistake, ignore my nonsense
about unexpected consequences. Your original assumption is correct, zero
offsets are pretty transparent.
What do you really want to do here? Mark'
Thanks, Erick -
Indeed every word will have a part of speech token but Is this how the slop
actually works? My understanding was that if I have two tokens in the same
location then each will not effect searches involving other in terms of the
slop as slop indicates the number of words *between* s
If I'm reading this right, your tokenizer creates two tokens. One
"report" and one "_n"... I suspect if so that this will create some
"interesting"
behaviors. For instance, if you put two tokens in place, are you going
to double the slop when you don't care about part of speech? Is every
word going