On Nov 18, 2007, at 11:09 PM, Yonik Seeley wrote:

I'm also wondering how others have accomplished this. Grant Ingersoll
noted that one of the original use cases was XPath queries so I'm
particularly interested in finding out if anyone has implemented that,
and how.

Me too.   Any clarifications on that Grant???

From what I understand from Michael Busch, you can store the path at each token, but this doesn't seem efficient to me. I would think you may want to come up with some more efficient encoding. I am cc'ing Michael on this thread to see if he is able to add any light to the subject (he may not be able to b/c of employer reasons). If he can't, then we can brainstorm a bit more on how to do it most efficiently.

An interesting thing here to think about is how we can come up with more general support for XML documents and other structured docs. For instance, a common syntax used in NLP for tokens is something like: The|DET quick|JJ red|JJ fox|NN jumped|VB over|??? the|DET lazy|JJ brown|JJ dogs|NN or other variations that also apply phrase identification, semantic relationships, etc. These things, to me, all logically fit as payloads, so it may be wise to think about coming up with one or two generic supports for these kind of things. One could be the default XML/XPath marked up document, but another might be this pipe notation that is common in NLP.

See http://wiki.apache.org/lucene-java/Payload_Planning and the related threads

-Grant

Reply via email to