Re: Lucene for a linguistic corpus

2013-01-09 Thread Wu, Stephen T., Ph.D.
>> For an example, in the phrase "A man saw a elephant" "saw" has annotations as >> follows (we also say that its position in index is 1234): >> >> {lemma: see, pos: verb, tense: past}, {lemma: saw, pos: noun, number: >> singular} >> >> I think, it would be more effective to insert parse index in

More about storing NLP-type stuff in the index

2013-01-03 Thread Wu, Stephen T., Ph.D.
was. Maybe the linking can be done via Payloads (offsets in the original text)? If I want to store multiple things at the same startOffset then I just use something like SynonymFilter? stephen On 12/21/12 6:45 AM, "Michael McCandless" wrote: > On Thu, Dec 20, 2012 at 3:54 PM, Wu, Ste

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-20 Thread Wu, Stephen T., Ph.D.
> If you stuff the end of the span into the payload you'd have to create > a custom variant of PhraseQuery to properly match based on the end > span. How different is this from the functionality already avaialable through SpanQuery? stephen --

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Wu, Stephen T., Ph.D.
t I want to do, and the things that can now be done in > GATE very easily, would be possible using Mike's suggested method. > > > -Glen > > On Thu, Dec 13, 2012 at 6:27 AM, Michael McCandless > wrote: >> On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D. >>

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-12 Thread Wu, Stephen T., Ph.D.
>> Is there any (preliminary) code checked in somewhere that I can look at, >> that would help me understand the practical issues that would need to be >> addressed? > > Maybe we can make this more concrete: what new attribute are you > needing to record in the postings and access at search time?

Semi-structured queries

2012-12-07 Thread Wu, Stephen T., Ph.D.
I’ve been trying to do semi-structured queries & query parsing. In other words, you could have XML snippets mixed in with plain terms, e.g. a query like: christmas tree where you’re looking for a document with the terms “christmas” “tree” but also some structured data about where (pract

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-30 Thread Wu, Stephen T., Ph.D.
ome > APIs do expose this, it's not very well explored yet (eg, you'd have > to make a custom indexing chain to get the attributes "through" > IndexWriter down to your codec). It would be great to make progress > making this easier, so ideas are very welcome :) > &

What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-11-27 Thread Wu, Stephen T., Ph.D.
t if you want to go with Payloads that do more than boosting a > term there's chances that you'll need to rewrite a big part of the query > stack. > > > Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : >> I think we're looking at doing something related. I

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-27 Thread Wu, Stephen T., Ph.D.
I think we're looking at doing something related. I haven't explored the Enums or know how to make a postings codec... But what is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? We're trying to incorporate attributes onto terms/spans in indexes. We'd also