On Thu, Dec 20, 2012 at 3:54 PM, Wu, Stephen T., Ph.D.
wu.step...@mayo.edu wrote:
If you stuff the end of the span into the payload you'd have to create
a custom variant of PhraseQuery to properly match based on the end
span.
How different is this from the functionality already avaialable
If you stuff the end of the span into the payload you'd have to create
a custom variant of PhraseQuery to properly match based on the end
span.
How different is this from the functionality already avaialable through
SpanQuery?
stephen
On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober
schno...@ids-mannheim.de wrote:
Am 13.12.2012 12:27, schrieb Michael McCandless:
For example:
- part of speech of a token.
- syntactic parse subtree (over a span).
- semantically normalized phrase (to canonical text or ontological code).
On Thu, Dec 13, 2012 at 10:09 AM, Glen Newton glen.new...@gmail.com wrote:
Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the
Am 18.12.2012 12:36, schrieb Michael McCandless:
On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober
schno...@ids-mannheim.de wrote:
This is a relatively easy example, but how would deal with e.g.
annotations that include multiple tokens (as in spans), such as chunks,
or relations between
On Wed, Dec 12, 2012 at 9:08 PM, lukai lukai1...@gmail.com wrote:
Do we have any plan to decouple the index process?
Lucene was design for search, but according the question people ask in the
thread it beyonds search functionality sometimes. Like we might want to
customize our scoring
Am 13.12.2012 12:27, schrieb Michael McCandless:
For example:
- part of speech of a token.
- syntactic parse subtree (over a span).
- semantically normalized phrase (to canonical text or ontological code).
- semantic group (of a span).
- coreference link.
So for example
Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the _end_ of a span) I think all
the things that I want to do, and the things that can
That would be really nice. Full standoff annotations open a lot of doors.
If we had them, though, I'm not sure exactly which of Mike's methods you'd
use? I thought payloads were completely token-based and could not be
attached to spans regardless. And the SynonymFilter is really to mimic the
Parts-of-speech is available now, in the indexer.
LUCENE-2899 adds OpenNLP to the LuceneSolr codebase. It does
parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an
Apache project for natural-language processing.
Some parts are in Solr that could be in Lucene.
It is not clear this is exactly what is needed/being discussed.
From the issue:
We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position.
This adds it to a token, not a span. 'same position' does not
I should not have added that note. The Opennlp patch gives a concrete
example of adding an annotation to text.
On 12/13/2012 01:54 PM, Glen Newton wrote:
It is not clear this is exactly what is needed/being discussed.
From the issue:
We are also planning a Tokenizer/TokenFilter that can put
Cool! Sounds great! :-)
Any pointers to a (Lucene) example that attaches a payload to a
start..end span that is more than one token?
thanks,
-Glen
On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog goks...@gmail.com wrote:
I should not have added that note. The Opennlp patch gives a concrete
Hi Glen,
I don't believe you can attach a single payload to multiple tokens. What I did
for a similar requirement was to combine the tokens into a single _ delimited
single token and attached the payload to it. For example:
The Big Bad Wolf huffed and puffed and blew the house of the Three
Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?
Maybe we can make this more concrete: what new attribute are you
needing to record in the postings and access at search time?
For
+10
These are the kind of things you can do in GATE[1] using annotations[2].
A VERY useful feature.
-Glen
[1]http://gate.ac.uk
[2]http://gate.ac.uk/wiki/jape-repository/annotations.html
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wu.step...@mayo.edu wrote:
Is there any
Do we have any plan to decouple the index process?
Lucene was design for search, but according the question people ask in the
thread it beyonds search functionality sometimes. Like we might want to
customize our scoring function based on payload. Sometimes i dont need to
store TF/IDF information.
On 11/28/2012 01:11 AM, Michael McCandless wrote:
Flexible indexing is the ability to make your own codec, which
controls the reading and writing of all index parts (postings, stored
fields, term vectors, deleted docs, etc.).
So for example if you want to store some postings as a bit set
I will probably have to implement my own datastructure and
parser/tokenizer/stemmer
Why? I mean, I think the point of the Lucene architecture is that the codec
level is completely independent of the analysis level.
The end result of analysis is a value to be stored from the application
Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?
If I understand you correctly, it's a little different from what's happening
in your blog posts:
On Fri, Nov 30, 2012 at 12:25 PM, Wu, Stephen T., Ph.D.
wu.step...@mayo.edu wrote:
Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?
If I understand you correctly, it's a little
Following up on a previous question...
What is flexible indexing in Lucene 4.0? We assumed it was the ability to
easily make new postings formats/codecs -- but a response below says that
would be tricky?
stephen
On 11/27/12 11:48 AM, David Causse dcau...@spotter.com wrote:
Hi,
We use
Flexible indexing is the ability to make your own codec, which
controls the reading and writing of all index parts (postings, stored
fields, term vectors, deleted docs, etc.).
So for example if you want to store some postings as a bit set instead
of the block format that's the default coming up
23 matches
Mail list logo